DataKit: Orchestrate applications using a Git-like dataflow
DataKit is a tool to orchestrate applications using a Git-like dataflow. It revisits the UNIX pipeline concept, with a modern twist: streams of tree-structured data instead of raw text. DataKit allows you to define complex build pipelines over version-controlled data.
DataKit is currently used as the coordination layer for HyperKit, the hypervisor component of Docker for Mac and Windows, and for the DataKitCI continuous integration system.
Development repository is https://raw.githubusercontent.com/moby/datakit
Community Slack: #datakit on https://dockercommunity.slackarchive.io
Weekly DataKit dev report for 2017-04-17 to 2017-04-23 (week 16)
This report covers weekly developments in the moby/datakit, mirage/irmin,
mirage/ocaml-git, and mirage/ocaml-9p repositories.
TL;DR: The effort this week has gone into preparing for a release of the development
trees next week, and for the renaming of the project repositories to the
Moby Project (see week17 for more on this topic).
The use of multi-stage builds vastly decreased the size of the published containers to
just 21MB, which makes deploying DataKit much more efficient!
Build and Packaging:
- The default Dockerfile now uses multi-stage builds to vastly shrink the size of the DataKit containers from 1.4-1.7GB to 21-22MB! (moby/datakit#522).
- Since the Docker Hub doesn't support multi-stage container builds yet, the Dockerfile
used for autobuilds was inlined to ensure that the published images stay in sync. (moby/datakit#521).
- The root
opam file is now called
datakit.opam to make it fit with the other sub-packages (moby/datakit#525).
- The repositories and self-ci example were update for new repository locations ([moby/datakit#528] and moby/datakit#520).
- Windows CI of DataKit was fixed up to account for recent changes (moby/datakit#527 @talex5).
- The GitHub bridge can now read its private key using Docker Swarm secret management, which avoids the need to spread the secret authentication token any further than necessary (moby/datakit#519 [@avsm]).
- The DataKit server no longer exposes an HTTP server. It was only used for debugging before, and Irmin 1.0 no longer supports it (moby/datakit#524 [@talex5]).
- As DataKit is used more in production, [@talex5] has been steadily improving error handling to ensure that callers can handle failures more gracefully, either via retrying or logging exceptions (moby/datakit#526).
PRs with activity
- [@djs55] uses DataKit in Docker for Mac for configuration management, and so he proposed a scheme to make branch handling more robust for real-world use of this feature. His PR covers the case of software upgrades and user-supplied overrides for particular configuration keys ([moby/datakit#523]).
- [@avsm] requested that DataKit CI be able to monitor a complete GitHub organisation and add monitoring hooks by watching the rigth events. [@samoht] proposed a fix to this in [moby/datakit#419], but it has become outdated due to upstream changes, so he is rebasing it.
- [@samoht] is also working towards making DataKit log less verbose commit messages, to reduce the size of the state repository (moby/datakit#476).
External Links or Blogs
Other reports in this series can be browsed directly in the repository at moby/datakit:/reports.