Irmin: examples or pointers of interacting with remote repositories (esp using authentication via ssh)


I am having some trouble understanding how to work with remote repos in Irmin, and hoping someone here could help. I have consulted the tutorial docs, the api docs, and the examples, in addition to looking at the source, searching github, and searching this forum, and I have not yet been able to find enough info to help me understand how to do the following:

  • Push/pull from a private repo using SSH authentication
  • Push/pull from a any repo repo using the git protocol (the docs show an example where a remote is created using git://, but when I’ve tried that same schema using a public repo of my own creation I get a “not found” error. The https based URI works however.)
  • Register a local repo in the same file system as a remote.

I was hoping someone with expertise in this forum could help me along here, either by sketching out he principles, or pointing me towards some working examples to study!

I plan to pay it forward with any learnings by trying to help address Add information/example about using `irmin-git` + `Mimic.ctx` to interact with a remote repository · Issue #1597 · mirage/irmin · GitHub :slight_smile:

Thanks in advance for the help! :pray:

I don’t have any non-MirageOS examples, but if you’re not scared of MirageOS, take a look at GitHub - roburio/unipi: Serving content from a git repository via HTTPS (including let's encrypt provisioning) as MirageOS unikernel which does a clone/update via whatever git-transport you like (git, http, ssh).


Thanks for the pointer, @hannes!

I don’t think I am scared of MirageOS, rather, quite curious and hope to dig in at some point. However, my current assumption is that trying to learn both Irmin and MirageOS simultaneously will increase, rather than reduce, the difficulty of on boarding.

It looks like all the ssh plumbing in that example is taken care of by Mirage’s git_ssh function. I was hoping that might shed some light on how to configure irmin, but it seems rather to invoke some printf-based meta-programming that I find pretty inscruitable at first glance: mirage/ at 530656c812dca63221f4836590886c66dcec70d1 · mirage/mirage · GitHub

Still, one more lead here is helpful, and perhaps I’ll investigate wrapping my current project in Mirage if I can’t work out any other understanding. Tho I wonder if resorting to a deeper dive into the Irmin source might not be a bit easier, if it needs to come to that…

The easy response

I’m not sure if you want to use irmin-unix/git-unix or if you want to make a MirageOS application but for the first case (and the second as well), you need to craft a Mimic.ctx which permits Irmin (and Git) to allocate a resource which correspond to an active connection (TCP, SSH, or HTTP(S)). Such value can be provided by Git_unix.ctx which requires an happy-eyeballs instance (to be able to resolve domain-name).

This ctx is filled with several protocols given by the git_mirage_* series (tcp, ssh with awa-ssh and http(s) with paf). All of them requires few more values. For instance, if you want to start a SSH connection, you must give an SSH key. Git_mirage_ssh.git_mirage_ssh_key is your witness to fill a Mimic.ctx with an SSH key. From that and a Smart_git.Endpoint.t, you can pull/push.

Something like:

let addr = Result.get_ok (Smart_git.Endpoint.of_string "")

let run git_repo : unit Lwt.t =
  let+ key = Awa.Keys.of_string ... in 
  Git_unix.ctx (Happy_eyeballs_lwt.create ()) >>= fun ctx ->
  let ctx = Mimic.add Git_mirage_ssh.git_mirage_ssh_key key ctx in
  Irmin_git_unix.KV.Backend.Remote.v git_repo >>= fun remote ->
  Irmin_git_unix.KV.Backend.Remove.pull remote (ctx, addr) main >>= fun _ ->

Should work but I’m not an Irmin expert (and I easily get lost between the functors and the signatures)

The complete response

You should take a look at the Mimic tutorial that explains the purpose of this library. The choice was made a long time ago to use Conduit as the library that can distribute the flow implementation needed to communicate with a peer. However, Conduit was created a long time ago and before extensible variants were available.

The goal of Conduit is to be able to allocate/connect a resource that can communicate with a peer, regardless of its implementation. This last point is particularly important for the MirageOS goal where the implementation must be completely abstract - and we mainly use functors for this. The crucial part here is: how to allocate such a resource? In our experience, the connect function is strongly implementation-dependent and difficult to abstract. This is the case between lwt_ssl and ocaml-tls for example, they can do the same job but they expect different values (an Ssl.context or a Tls.Config.client).

Of course, we can then imagine a supra-interface for TLS and hide the implementation details in a nice API. But such a job can be difficult (even if we consider that lwt_ssl and ocaml-tls do the same thing, there are noticeable differences especially in the details) and arbitrary (the API can be made to take advantage of one implementation rather than another).

That being said, Conduit provides an implementation-independent function to allocate such a resource: Conduit_lwt_unix.connect. This function also depends on a “ctx” that contains a description of what the user expects (a TCP connection? a TLS connection? etc.).

However, as I said, internally, Conduit does not use the latest OCaml features including extensible variants. The latter is necessary for us to extend the possible “communication” implementations. In the specific case of Git, we needed to extend Conduit with an SSH implementation.

Other minor points are to be regretted by Conduit but we won’t dwell on them. However, a lot of work has been done in Conduit (including CoHTTP) to fix this situation with version 3.0. However, due to internal differences in the Mirage core team (and, among other things, because I didn’t have the courage to continue), the project was aborted.

However, being a user of ocaml-git and MirageOS, I still considered this solution to be viable and the core users were quite happy with the change. So I integrated my work under the name Mimic! I especially facilitated the integration into MirageOS which I think is the most important. But I have examples of uses of Mimic where I use for example a FIFO to implement tests between a real Git server and my implementation. Or an integration of OpenSSH instead of awa-ssh.

More simply, Mimic offers the ability to have an implementation of a protocol that will be decided at runtime: a bit like the virtual method in C++ but with the module sauce :slight_smile:. Thus, Git only waits for a protocol implementation that will be concretized in the famous “ctx”. Then, on top of it, we have our Git protocol implementation: the Smart protocol. The advantage is to be able to switch between SSH or HTTPS without really changing our code (as we could do with a functor).


Thanks so much for the detailed reply, @dinosaure! I should have time to process your guidance here over the weekend, and I’ll update with any significant process :slight_smile:

1 Like

Just coming around to working through this advice. Thanks again!

I had a question regarding:

  let+ key = Awa.Keys.of_string ... in 

Trying to figure out what sort of input this expects, I peeked at the source:

Which seems to require a string of the format with either "rsa:{key}" or "ed25519:{key}", this is not a format I’m familiar with (I’m certainly no cryptography expert!). Do you have any documentation in mind for this format?

I’m also curious why it relies on this string prefix to identify the key type, since it doesn’t seem to relate to how keys are stored or generated from what I’ve seen.

Yes, the format is undocumented on the awa-ssh package (but it is on the mirage tool) - btw, thanks to filled our issue about that. Moreover, awa-ssh provides a simple tool awa_gen_key which can be useful for you. It generates “in-the-fly” an SSH key, The first line correspond to what you should give to the Awa.Keys.of_string. The second line is the public key of the SSH key (which should be save to your server).

For instance, for a RSA key:

$ awa_gen_key
seed is 1Btkt/itNjJEVSYqMURUYcrjO9f4j7IvMKZiBNzu
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCeuprsUNNl5nsjbPo6Fzi1JgVBMp4gin0Mw5y1+B+goqITZPpBJjQUNNP/twM+dtE/0Lz23BZcCkiAuUM1GFHU+Eg6TAcXxRdzJzFfjMAi4bvLSy8CWCJ0tonvoVCWYA3i5CGy6mZu84u96DIFh8BXEEIOklSQHG8V1Cu1pV4z+UyaSpThhgo+vK5/89sLAg0N7YuupvgK6WlIjAdbfHBX7s6pmdhoGoL1iM5HfP5x8qpr3UiQQehEIOwuRtEUXvKd3GeUeoUPoicybr5i3C6alYGZxjhvE30N82qNEkgKS3HTPF05vhZoVXDEcfWguOuwSQq3mmY97uAS7WDBD/5ugfex51vx1izMBqOHIB0umGxMaVjRMVixxSofBo7X74/rwJycYOEx2IXNlb14FNom4InMw7VlvfJqyaCnq1zMew5L/U5fzRlfdV3uaB7eWITruAE3cESC+MiILBSoT7XbquhmkGjj515+JOh08RqeCHZHKyJzOiDgmNMZRnV0kWSFM1fpGhDtcyVZVGc0is158FG2Q8eNdAl46hVHofzfXIqIfi0OBvfBebDYrel9uyP0LMaMANuz1A0Yt1q3/hSwF/e/p47bbecEoFi26agmI8qYGRtOuvtMHkmSz5ojD64MJ4k45XHO7Rq/HV6EuWJVLKScGBlfth8QsYxpQJvOzQ== awa@awa.local

Then, your argument --ssh-key will be rsa:1Btkt/itNjJEVSYqMURUYcrjO9f4j7IvMKZiBNzu. Note that this argument is not your private RSA key (which is too long) but a seed which can regenerate the private RSA key with the Fortuna PRNG algorithm.

For ed25519, the private key is short enough to be able to pass that into the command-line. It follows the same idea:

$ awa_gen_key --keytype=ed25519
ED25519 private key VByFnP541hXF+hN7Ia2ZyB+SLECOnfcuOHvPloEOlpw=
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBiMEY2+mBPN0zb072myH8flEhc3oakFE/VkGN4mfEUB awa@awa.local

The argument will be: --ssh-key=ed25519:VByFnP541hXF+hN7Ia2ZyB+SLECOnfcuOHvPloEOlpw=.

Note that due to the ability to instanciate an other protocol implementation than awa-ssh, it is possible to inject an implementation which calls directly the ssh tool (as I did for docteur) as explained into my previous post. By this way, if the user uses ssh-agent and wants to initiate a SSH connection with his/her SSH key, the user can :slight_smile: - and it will be a basic call with ssh as Git does.

This is neat, but afaiu, is not useful if the goal is to connect with a server that is already configured (i.e., to interact with an existing git server registered with your public key). I guess this is probably useful when setting up systems of unikernels?

Thank you for the followup. It really helps clarify and I was able to catch problem as a result.

The ed25519 private key I am trying to provide ends up being much longer when Base64 encoded (I don’t know why), and I think I now understand why I am hitting the (quite inscrutable) error ERROR: Cannot parse point: invalid length when trying to provide the key.

In general, I think it could make things easier if Awa could take a path to a private key file and then do whatever it needs to extract things in the format it needs. Would that be feasible?

In particular, it seems to me that the API might be made a bit easier with something like Awa.Keys.of_string : typ -> string -> (Awa.Hostkey.priv, [> Msg of string ]) result. This avoids a "stringly typed" input, and makes it even more convenient it could do the base64 encoding of the input param (perhaps also providing an Awa.Keys.of_string_base64` that assumed the input was already encoded).

It’s weird to me that if I had a properly encoded private key I’d need to prefix "ed25519:" to the string to feed it to this function, only to have the function then convert this to a variant. WDYT?

I’ve really strayed off topic now, but it’s an interesting rabbit hole, despite the challenges and the wall-head-banging :smiley:

Thanks again for all the help and feedback!

About one year ago github stopped serving the git protocol.

The example in the doc is stale in that it points to a repo hosted on a platform that stopped using that protocol, but it might otherwise be valid.

Please note that awa-ssh is still in the early days, and indeed some functionality may be missing. The one you’re refering to seems to be the OpenSSH private key format (openssh private key format · Issue #17 · mirage/awa-ssh · GitHub) and the exposure thereof.

Why the API is like it is “with strings” – well, eventually in the MirageOS ecosystem you need to pass data as strings as boot parameter. Now, there’s the choice: use a lot of boot parameters (--ssh-key-type and --ssh-key-value) or to flatten that to a single one (--ssh-key). After some experience with the former, I tend to use the latter.

And what value to pass? Of course you can use the openssh private key format – for an ed25519 key that would be 399 bytes (compared to 69 + 8 = 77 bytes with the current format). Certainly there’s a limit of what argv can carry (I don’t know the exact limits, would need to look into solo5).

I don’t know what your operation story is, but I use separate keys for myself compared to a unikernel. The idea behind this is that when the key of the unikernel is compromised, I can retract it separately from other keys.

Sure, any PR for awa_gen_key implementing that would be very welcome.

(On related notes, more recently we worked on explaining the expected format for X509 authenticators and nameservers (see Auhenticator.of_string: improve error message to output the desired f… · mirleft/ocaml-x509@dd16ca4 · GitHub Mirage_git_http: When failing to parse authenticator report what the expected format is · Issue #582 · mirage/ocaml-git · GitHub Explain when the argument used for the authenticator is malformed by dinosaure · Pull Request #593 · mirage/ocaml-git · GitHub dns-client-mirage udp by hannesm · Pull Request #322 · mirage/ocaml-dns · GitHub – we should enhance awa’s Keys.of_string in a similar fashion)

1 Like