[ANN] First release of Docteur, an opiniated read-only file-system for MirageOS

I’m glad to announce the first release of docteur, a simple tool to make and use (in read-only) a “file-system” for MirageOS. As you know, with MirageOS, we don’t have sockets, kernel space or even file-descriptor. It’s not possible to manipulate files standalonely and many primitives commonly available with the unix module don’t exists in our space.

Therefore, it is difficult to imagine making a website that displays local files or a database system. But in our spirit of separation of services, it becomes possible for your unikernel to communicate over the network to a “file system” or a database.

For quite some time we have been experimenting with a file system external to our unikernel called Git. This is the case of pasteur which saves the pastes in a Git repository. It is also the case of unipi or Canopy which display the content of a Git repository and can resynchronize with it using a hook. Or the case of our primary DNS server whose zone file comes from a Git repository - we can then trace all the changes on this file.

However, we have several limitations:

  1. it requires the Git repository to load into memory in your unikernel
  2. it requires a communication (external with GitHub or internal in a private network)

The persistent aspect is very important. We should always be able to launch a unikernel and not lose the data if our system shuts down.

The mutable aspect (modify a file) is useful in some cases but not in others. As for unipi for example (a simple static web site), the difference between resynchronizing with a hook or restarting the unikernel with a new version of your filesystem is minor.

Docteur as a second solution

This is where Doctor comes in. It solves both of our problems by offering the generation of a file system from scratch:

  • a Git repository (local or available on a service)
  • a specific folder

Doctor is able to create a complete representation of a folder and to compress it at such a ratio that a generation of the documentation of several OPAM packages with all their versions making 14 Gb is reduced to an image of only 280 Mb!

Such a high compression ratio is in particular due to a double level of compression by decompress and duff. For more details, Docteurr just generates a slightly modified PACK file with carton.

Then, Docteur proposes a simple library which makes available 2 ways to manipulate this image for your unikernel:

  1. a way that is fast but with a consequent boot time
  2. a slower way but with no cost to the boot time

The first way will simply “analyze” the image to re-extract the layout of your file system. Then it uses the ART data-structure to save this layout. So, whenever you want a specific file and according to ART benchmarks, you have access to the content very quickly.

The problem remains the analysis which takes place at boot time and which can take a very long time (it depends essentially on the number of files you have). There can also be an impact on memory usage as the ART data structure is in memory - the more files there are, the bigger the structure is.

The second method is more “silly”. Each time you request a file, we will have to rebuild the entire path and therefore deserialize several objects (like folders). The advantage is that we don’t analyze the image and we don’t try to maintain a layout of your file system.

Example

Docteur is meant to be simple. The generation of the image is done very simply by the command make:

$ docteur.make -b refs/heads/main https://github.com/dinosaure/docteur disk.img
$ docteur.make -b refs/heads/main git@github.com:dinosaure/docteur disk.img
$ docteur.make -b refs/heads/main git://github.com/dinosaure/docteur disk.img
$ docteur.make -b refs/heads/main file://$(pwd)/dev/docteur disk.img

Then, Docteur proposes 2 supports: Unix & Solo5. For Unix, you just have to name explicitly the image file to use. For the case of Solo5 (and thus of virtualization). You just have to find a name for a “block device” and to reuse this name with the Solo5 “tender” specifying where the image is.

$ cd unikernel
$ mirage configure -t unix --disk disk.img
$ make depends
$ mirage build
$ ./simple --filename README.md
$ cd unikernel
$ mirage configure -t hvt --disk docteur
$ make depends
$ mirage build
$ solo5-hvt --block:docteur=disk.img -- simple.hvt --filename README.md

Finally, Docteur proposes another tool that checks (and analyzes) an image to give you the version of the commit used (if the image comes from a Git repository) or the hash of your file system produced by the calculation of a Merkle tree.

$ docteur.verify disk.img
commit	: ad8c418635ca6683177c7ff3b583e1ea5afea78f
author	: "Calascibetta Romain" <romain.calascibetta@gmail.com>
root	: bea10b6874f51e3f6feb1f9bcf3939933b2c4540

Merge pull request #11 from dinosaure/fix-tree-expanding

Fix how we expand our file-system

Conclusion

Many times people ask me for a purpose in MirageOS such as a website or a particular service. I think that Docteur shows one essential thing about MirageOS, it is a tool and an ecosystem. But it’s not an endpoint that is concretized in a specific application.

Docteurr is not THE solution to our problems and answers a specific use case. What is important to note is not what Docteur does but the possibility for our ecosystem and our tools to allow the development of Docteur. As it allows the development of a trillion applications!

As such, I say to those people to “play” with MirageOS if they want to learn more. Our goal is not to show you applications that you could then deploy easily (even if we are working on this aspect too) but to give you the possibility to imagine your OS (independently from our vision)!

And if you try, we’ll be happy to help you!

11 Likes