How do you do stream writes in Irmin?

Well I am writing a p2p blob store, I am using the git unix store with cstruct as its contents, the problem is with the blobs being to big to fit in memory, so I decided to chunk the files into separate cstructs, and send them along the wire after the session has been established, and then write the chunks, in order to append in Irmin you need to do a functional update, which won’t work in this case how do you get around this?

1 Like

Hmmhmm, I can explain something about ocaml-git but I will let @samoht explain the Irmin’s part.

So, in the current version of ocaml-git (1.11.2), we already catched some problems about the memory consumption when we deserialize a Git object. As you said, for a blob object (which corresponds to your file), Git stores all of your file and, obviously, a blob can not be loaded in the memory totally.

On this specific problem, we develop a new version of ocaml-git which decodes your blob chunk by chunk (I means a fixed-size chunk) - this is the interface of the Decoder available in the Blob/Value implementation and used when your Git object is a loose object or a packed object.

Then, we can make a stream like (unit -> Cstruct.t) which uses physically one Cstruct.t and feed it for each call - and to control the memory consumption.

However, I said: “we develop”, that means this work is not done yet on the ocaml-git's API and, by the way, in the irmin's API.

This is my first technical response: you need to wait the next release of ocaml-git :smiley: !

However, I don’t know your plan about irmin/ocaml-git but you need to know than Git (and ocaml-git) was not think to store some huge files. I mean, if you want to store a video or more simply an image, you could have a problem just about Git/ocaml-git just because Git was not think to store huge files.

Indeed, the compression (zlib) can not compress a video or an image which is already compressed with a better algorithm (it’s try to zip a GoT.avi episode, you will win nothing in the weight). The second compression in Git is a delta-ification which it’s a patch between 2 files and we can easily say than a diff between 2 video/image could not be very efficient.

For all of these technical points, generally, it’s a bad idea to push a video/image in your Git repository. Firstly because you will have a big blob object and secondly because when you want to retrieve this blob the computation will be slow (because we will apply the delta-ification but it stills inefficient).

So, my second response is to think if it’s the best to use ocaml-git (or Git generally) to store some (huge?) files.

But thanks to use ocaml-git :slight_smile: (you can follow the new version of ocaml-git in this PR and if you have moe specifics questions, feel free!

1 Like

Thanks for the response, I’m probably just going to just switch over to plain old file system, since either way the synchronization primitives in irmin aren’t suitable for this endeavor and I really don’t need the consistency gaurentees of the 3 way merge, due to the data in question being immutable blobs.

1 Like