DataKit dev reports



DataKit: Orchestrate applications using a Git-like dataflow

DataKit is a tool to orchestrate applications using a Git-like dataflow. It revisits the UNIX pipeline concept, with a modern twist: streams of tree-structured data instead of raw text. DataKit allows you to define complex build pipelines over version-controlled data.

DataKit is currently used as the coordination layer for HyperKit, the hypervisor component of Docker for Mac and Windows, and for the DataKitCI continuous integration system.

Development repository is
Community Slack: #datakit on

Weekly DataKit dev report for 2017-04-17 to 2017-04-23 (week 16)

This report covers weekly developments in the moby/datakit, mirage/irmin,
mirage/ocaml-git, and mirage/ocaml-9p repositories.

TL;DR: The effort this week has gone into preparing for a release of the development
trees next week, and for the renaming of the project repositories to the
Moby Project (see week17 for more on this topic).
The use of multi-stage builds vastly decreased the size of the published containers to
just 21MB, which makes deploying DataKit much more efficient!

PRs merged

Build and Packaging:

Functionality improvements:

  • The GitHub bridge can now read its private key using Docker Swarm secret management, which avoids the need to spread the secret authentication token any further than necessary (moby/datakit#519 @talex5 @avsm).
  • The DataKit server no longer exposes an HTTP server. It was only used for debugging before, and Irmin 1.0 no longer supports it (moby/datakit#524 @samoht @talex5).
  • As DataKit is used more in production, @talex5 has been steadily improving error handling to ensure that callers can handle failures more gracefully, either via retrying or logging exceptions (moby/datakit#526 @talex5).

PRs with activity

  • @djs55 uses DataKit in Docker for Mac for configuration management, and so he proposed a scheme to make branch handling more robust for real-world use of this feature. His PR covers the case of software upgrades and user-supplied overrides for particular configuration keys (moby/datakit#523).
  • @avsm requested that DataKit CI be able to monitor a complete GitHub organisation and add monitoring hooks by watching the rigth events. @samoht proposed a fix to this in moby/datakit#419, but it has become outdated due to upstream changes, so he is rebasing it.
  • @samoht is also working towards making DataKit log less verbose commit messages, to reduce the size of the state repository (moby/datakit#476 @samoht).

External Links or Blogs

Other reports in this series can be browsed directly in the repository at moby/datakit:/reports.


Weekly DataKit dev report for 2017-04-24 to 2017-04-30 (week 17)

This report covers weekly developments in the moby/datakit, mirage/irmin, mirage/ocaml-git and mirage/ocaml-9p repositories.

TL;DR: The project now has a new home in the Moby Project and the main DataKit repository was renamed. There were three releases this week, of moby/datakit:0.10.0, mirage/irmin:1.1.0 and mirage/ocaml-9p:v0.10.0, with mega build speedups and improvements to the DataKit Continuous Integration engine.

PRs merged

Build and Packaging:

Functionality improvements:

  • DataKit now allows the use of / in branch names (moby/datakit#533 @samoht).
  • The GitHub bridge was simplified by removing the VFS layer, which was intended to be useful for debugging but rarely used in practise (moby/datakit#535 @samoht).
  • The Continuous Integration subsystem was improved to make the required GitHub scopes configurable to make it easier to deploy on public GitHub repositories with less privilege (moby/datakit#534 @talex5 @avsm) and to accept SSH keys automatically when autocloning state repositories (moby/datakit#536 @talex5).
  • A regression in Irmin 1.0 in the Irmin.Tree.diff function, where nested diffs were reported with the wrong path was fixed (mirage/irmin#438 @samoht).
  • Irmin can now specify branches in URLs for the fetch function (mirage/irmin#432 @samoht).

Thanks to @samoht @avsm and @djs55 for handling all the release activity this week in moby/datakit#538 moby/datakit#540, moby/datakit#541, moby/datakit#542 and mirage/ocaml-9p#120.

PRs closed this week without merge

  • @kayceesrk and @samoht have been discussing the semantics of merging option types in mirage/irmin#421, and concluded that Irmin 1.0.1 has the right semantics. mirage/irmin#422 was closed as a result.

  • @yomimono and @samoht did a lot of work in the runup to the MirageOS 3.0 release a few months ago to prepare DataKit for support, and all of that functionality is now present in DataKit 0.10, so moby/datakit#433 and moby/datakit#459 were closed.

  • While building the mirage-ci, @avsm had reported that invalid job names from the CI resulted in inscrutable errors in the web interface. After some discussion, @talex5 decided that moby/datakit#492 wasn’t worth the complexity, and that it should be addressed elsewhere.

  • @samoht had put together a design for a dkt CLI tool in moby/datakit#146. While there were many good ideas in the design, it has never been implemented fully, so the PR is closed. Get in touch with @samoht if you’d like to build it!

Ongoing activity

External Links or Blogs

Other reports in this series can be browsed directly in the repository at moby/datakit:reports.


Weekly DataKit dev report for 2017-05-01 to 2017-05-07 (week 18)

This report covers weekly developments in the moby/datakit, mirage/irmin, mirage/ocaml-git and mirage/ocaml-9p repositories.

TL;DR: It’s been a quiet week after the previous week’s slew of releases, with mainly packaging fixes and debugging deployments of DataKit CI in LinuxKit.

PRs merged

Build and Packaging:

  • The automated release infrastructure was improved to support Jbuilder (moby/datakit#543 @samoht).
  • The GitHub bridge version constraints were fixed to improve OPAM installation (moby/datakit#544 @samoht).
  • The test harnesses were also extended to test on OCaml 4.03.0, which is the new minimum supported version of the compiler as of MirageOS 3.0 (moby/datakit#546 @samoht).

Functionality improvements:

Ongoing activity

External Links or Blogs

Other reports in this series can be browsed directly in the repository at moby/datakit:/reports.


Weekly DataKit report for 2017-05-08 to 2017-05-14 (week 19)

This report covers weekly developments in the moby/datakit, mirage/irmin, mirage/ocaml-git and mirage/ocaml-9p repositories.
This week also saw @dinosaure and @eyyub also release two new libraries that replace functionality formerly written in C with pure OCaml implementations:

  • Digestif is a standalone cryptographic hashing library.
  • Decompress is a zlib implementation in pure OCaml.

Both are now in the Mirage GitHub organisation and will be integrated into Irmin and DataKit over the next few months. The first step is to replace the use of camlzip fully, and then switch ocaml-git from using Nocrypto and GMP to the pure OCaml digestif instead. Irmin and DataKit support for both will follow after that.
There is also an experimental new Git packfile encoder and decoder called Sirodepac to let ocaml-git perform compression of repositories more easily, also by @dinosaure.

There has also been significant progress this week on having a filesystem that Irmin and DataKit can use when compiled as unikernels (and hence only have access to a raw block device):

  • Tom Ridge announced the initial release of a formally verified btree filesystem. It still needs to be integrated with the MirageOS filesystem interfaces, so volunteers who want to glue things together are welcome to participate.
  • Gabriel de Perthuis also responded to note that he is continuing to work on a flash-optimised pure OCaml filesystem (using hitchhiker trees). It is not quite ready to open-source yet, but he anticipates doing so in the next month or so.

Between both of these efforts, it looks like a fully unikernel-aware, type-safe DataKit that persists onto a block storage device is not too far away.

Finally, we welcome David Udelson, a junior from Cornell who has been selected as a Google Summer of Code student to work on a REST API for Irmin!


This week saw a minor moby/datakit:0.10.1 release of DataKit to support the latest versions of the 9P and Lwt libraries. Both of these have some backwards-imcompatible changes, so the 0.10.1 release of DataKit lets us use the latest features.

PRs merged

The Jbuilder build porting journey continues to be successful:

Ongoing activity

  • Work continues on switching to Digestif instead of Nocrypto (which has C stubs) (git#214 @dinosaure @samoht).
  • The 9P interfaces are being made safer against leaking exceptions on read/write (9p#126 9p#125).
  • The tests, client and server code has been refactored to make transport layer abstract. The default is still to use 9p but this is the first step to replace it by gRPC: datakit#551 @samoht.

Other reports in this series can be browsed directly in the repository at moby/datakit:/reports.


Thomas - thank you for posting these. Very useful, and much appreciated.


@samoht this is really excellent, thanks for posting. Regarding the format of these reports, perhaps a good structure would be for the main post topic to briefly explain what DataKit and the associated reposirtories are (from the README), and then you can just add a weekly comment with the latest dev report?

I’ll also look into generating MirageOS dev reports in the same style, and perhaps we can add a category for updates so that other projects can also participate with their own.


That makes sense, I’ve edited the first post to add a bit of context.

Streamlining the OCaml build ecosystem

Weekly Datakit dev report for 2017-05-15 to 2017-05-28 (week 20 and 21)

This report covers two weeks of developments in the moby/datakit mirage/irmin
mirage/ocaml-git mirage/ocaml-9p repositories.

This week saw DataKit add direct client bindings that use Git directly, and
thus do not require a 9P server to be running. This improves performance and
deployment simplicity of DataKit, but removes the intermediate flexibility that
a 9P filesystem offers.

Datakit also features a local filesystem bridge, in addition to the remote
GitHub one, to faciliate desktop development with local repositories. This
filesystem bridge was improved this week by using the latest Irmin release,
with better watch support moby/datakit#577.

Most of these improvements are still in the various master branches of
Irmin and DataKit, and will appear in a release near you soon.

If you are interested in contributing to any of these repositories, then
mirage/irmin#415 has a thread about coding standards and getting started
with the Irmin REST API.

PRs merged

Client API:

  • Golang: separate user config from defaults in the database ([moby/datakit#523] [@djs55] @talex5).
  • Golang: add List func in the Snapshot module to get files in a directory ([moby/datakit#578] @ebriney)
  • OCaml/9P: Remove rename API calls in the client, as there were no users and it doesn’t work with 9P directories (moby/datakit#563 @talex5 @samoht)
  • OCaml/Git: Fix the exists function in the new Git client API (moby/datakit#576 @samoht).
  • OCaml: Renamed Datakit_path into Datakit_client.Path and Datakit_S into Datakit_client.S, to create a Datakit_client namespace (moby/datakit#558 @samoht)
  • OCaml: Added client bindings that use Git directly without a 9P server process (moby/datakit#559 @samoht)
  • Use Irmin.Merge.idempotent instead of re-implementing our own (moby/datakit#564 @samoht)
  • DK.commit now fails if the commit does not exist (moby/datakit#565 @samoht)
  • Build the 9pmount helper in the the datakit-client-9p library instead of `datakit-client) (moby/datakit#573 @samoht)




Other reports in this series can be browsed directly in the repository at moby/datakit:/reports.


Weekly DataKit dev report for 2017-05-29 to 2017-06-04 (week 22)

This report covers weekly developments in the datakit irmin git 9p repositories.

This week saw a significant simplication in how the GitHub bridge is deployed, as it can now run without a server and use a local Git repository directly (datakit#577 @talex5 @samoht). This in turn simplifies the deployment of DataKit as part of various CI services use it now, such as the LinuxKit Moby project.

The GitHub bridge also adds an an owner file to identify the PR creator, which can be used by DataKitCI plugins (datakit#587 @samoht).

For those getting started with DataKit and Irmin, @nickbetteridge started an issue on how to get started (irmin#450).
Our GSoC intern @dudelson is also active on irmin#415 about the HTTP REST API (@dbuenzli @dudelson @samoht), and has started
a tracking issue on contributing documentation improvements (irmin#451). This is massively appreciated!

As part of the move to standardise interfaces across Moby components, there is working on adding building GRPC and CaPnP RPC libraries so that DataKit can make use of these. Check in with @talex5 if you are interested in contributing.

Releases are now being cut with this functionality, starting with git:1.11.0.

PRs this week

External Links or Blogs

Other reports in this series can be browsed directly in the repository at datakit:/reports. There is also a Docker Community Slack channel in #datakit.