DataKit: Orchestrate applications using a Git-like dataflow
DataKit is a tool to orchestrate applications using a Git-like dataflow. It revisits the UNIX pipeline concept, with a modern twist: streams of tree-structured data instead of raw text. DataKit allows you to define complex build pipelines over version-controlled data.
Weekly DataKit dev report for 2017-04-17 to 2017-04-23 (week 16)
TL;DR: The effort this week has gone into preparing for a release of the development
trees next week, and for the renaming of the project repositories to the
Moby Project (see week17 for more on this topic).
The use of multi-stage builds vastly decreased the size of the published containers to
just 21MB, which makes deploying DataKit much more efficient!
Build and Packaging:
- The default Dockerfile now uses multi-stage builds to vastly shrink the size of the DataKit containers from 1.4-1.7GB to 21-22MB! (moby/datakit#522 @talex5).
- Since the Docker Hub doesn’t support multi-stage container builds yet, the Dockerfile
used for autobuilds was inlined to ensure that the published images stay in sync. (moby/datakit#521 @talex5).
- The root
opamfile is now called
datakit.opamto make it fit with the other sub-packages (moby/datakit#525 @talex5).
- The repositories and self-ci example were update for new repository locations (moby/datakit#528 and moby/datakit#520 @talex5).
- Windows CI of DataKit was fixed up to account for recent changes (moby/datakit#527 moby/datakit#122 @talex5 @samoht).
- The GitHub bridge can now read its private key using Docker Swarm secret management, which avoids the need to spread the secret authentication token any further than necessary (moby/datakit#519 @talex5 @avsm).
- The DataKit server no longer exposes an HTTP server. It was only used for debugging before, and Irmin 1.0 no longer supports it (moby/datakit#524 @samoht @talex5).
- As DataKit is used more in production, @talex5 has been steadily improving error handling to ensure that callers can handle failures more gracefully, either via retrying or logging exceptions (moby/datakit#526 @talex5).
PRs with activity
- @djs55 uses DataKit in Docker for Mac for configuration management, and so he proposed a scheme to make branch handling more robust for real-world use of this feature. His PR covers the case of software upgrades and user-supplied overrides for particular configuration keys (moby/datakit#523).
- @avsm requested that DataKit CI be able to monitor a complete GitHub organisation and add monitoring hooks by watching the rigth events. @samoht proposed a fix to this in moby/datakit#419, but it has become outdated due to upstream changes, so he is rebasing it.
- @samoht is also working towards making DataKit log less verbose commit messages, to reduce the size of the state repository (moby/datakit#476 @samoht).
External Links or Blogs
- “Optional Dependencies considered harmful” by Rudi Grinberg explains very well why DataKit and Irmin now have explicit OPAM packages rather than
depoptsfor optional functionality.
- “Introducing Docker multistage builds” explains how to achieve the extreme container shrinkage in your own projects.
Other reports in this series can be browsed directly in the repository at moby/datakit:/reports.