Boot modes: network boot and hybrid boot

Introduction

In its default mode of operation, WalT nodes boot over the network. This has several pros, described below. However, for some experiments, this mode may not be applicable. In this case, users can fallback to one of the hybrid boot variants.

Network boot mode

In a network booting scenario, the OS is stored on a remote server and is never transfered as a whole to the node(s). Instead, the node sends network requests to the server each time it needs to access a file or directory. In a sense, compared to booting on a local disk, network booting just means using the network controller to read OS files and metadata instead of the disk controller.

In the case of a WalT node, the remote WalT image files and metadata are accessed in read-only mode. In its bootup procedure, the node stacks on top of this remote layer an “overlay” layer implementing copy-on-write semantics and using the RAM for storing modifications. This allows the OS (or the WalT user) to create, modify or delete files even if the remote OS image is accessed in read-only mode.

This setup has the following pros:

  • Near-zero OS deployment delay. Instead of transfering a whole image OS from the server to the node’s disk, as most other platforms do, the server just updates a symlink to target the proper OS image and reboots the node. One could argue that network transfers happen once the node is booting instead, but the method remains very frugal on network resources: only the files and metadata that are actually necessary are transferred, which is typically less than 10% of the total OS image size.

  • Reproducibility ensured at each reboot. Since the modifications are stored using an overlay in RAM, they are cleared when the node reboot, and the node restarts using exactly the same OS files, i.e., those of the read-only remote WalT image.

  • Reduced platform maintenance burden. The local disks or SD cards of the nodes are usually the most fragile components of a platform, so avoiding their use is preferable.

In order to avoid exhausting the RAM space with created and modified files, WalT nodes also mount a special memory swap device at bootup if the WalT OS image supports it (recent debian-based images do). It is based on /dev/nbd0, a “Network Block Device”, so if the node starts swapping, data will actually be stored in a temporary file on the WalT server.

However, please note that swapping can reduce the reproducibility of an experiment and might cause OS issues in extreme situations. So you should avoid getting to this situation by following these tips:

  • Heavy operations on files should be prepared in the walt images, not run on the nodes. You can use walt image shell or walt image build to prepare the OS images with those heavy operations (such as the installation of OS packages). If a part of your setup procedure uses walt node save, it should be called with an image where those preliminary steps have already been applied.

  • Large experiment result files should be generated in the directory ``/persist`` of the node. This directory is excluded from the RAM overlay setup; instead, it is a read-write NFS share, so files stored there are actually stored on the server, and they are preserved accross node reboots.

Sometimes, one may use the local disk(s) of a node as storage for a part of the experiment, and let the network boot mode manage the OS files. It is easy to add local disks to virtual nodes (see walt help show vnode-disks for more info).

However, having the OS stored remotely may not match every possible experiment. The alternate mode described below was designed to overcome this limitation.

Hybrid boot mode

The hybrid boot mode allows to let the node copy the image contents on its local storage and then boot from it.

Whatever the boot mode, the procedure is always network-based at first (i.e. targetting the proper WalT OS image, and then downloading and running the kernel this image contains). But then, walt-init scripts look for local disks (or SD cards) and ext4-formatted partitions. If a directory named walt_hybrid is found in the root directory of one of these partitions, the hybrid boot mode is enabled. Otherwise, the network boot mode continues.

Two hybrid mode variants exist: volatile and persistent.

Similarly to the network boot, the volatile variant properly ensures reproducibility at each reboot. The copy of image contents on the local disk is kept read-only and another sub-directory (of the same disk) is used as a read-write overlay. This overlay directory is reset at each bootup, which leads to a clean boot, free of any artefact from the previous runs, and using only the WalT image contents at first.

The persistent variant behaves differently: all file modifications are preserved accross reboots. Thus reproducibility at each reboot is not ensured in this case.

One can activate the volatile hybrid boot on a physical node by creating a directory named walt_hybrid at the root of an ext4 partition, on a local disk (or SD card) of the node. If an empty file named .persistent is created in this directory walt_hybrid, the boot method switches to the persistent variant.

Using virtual nodes, one just has to select the appropriate template when adding the disk. (See walt help show vnode-disks for more info.)

Important notes:

  • The first time a node boots an image in hybrid mode, the bootup time will be longer (up to several minutes longer, depending on network, disk, and image size) because of the complete copy of the WalT image to the local disk.

  • The default partitionning of SD cards usually involves a single FAT32 partition. Activating the hybrid mode in this case involves reducing the size of this FAT32 partition, creating a second partition in the new space, formatting this second partition using an ext4 filesystem, and creating the walt_hybrid directory.

  • Frequent writes of large OS images can reduce the lifetime of the local storage (this is especially true for non-industrial-grade SD cards!).