One big library or many small libraries?

I have been using ocaml for more than 7 years and in my projects I have
generated a lot of code which is reusable. My main topic is language design and
compiler writing. In this area two main subtasks are

  • parsing

  • pretty printing

The reusable parts in this domain have already been made available in the opam
library fmlib. But there are more general functions needed. E.g.

  • An elm like framework to make a subset of a language available within a
    browser (practically a try it on the web).

  • An IO framework bases on common IO functions which can be implemented either ocaml natively or compiled to javascript for execution with nodejs. The latter one is extremely useful to deploy programs written in ocaml to users which don’t have an ocaml environment (in particular windows users).

What is the best way to make these functions availalbe to the ocaml community?

  1. Make a big library and include the functions successively i.e. in my case put
    all the generic functions into fmlib.

  2. Make more libraries where each library serves a very specific purpose e.g. a
    library elmish which provides the browser framework and a library io
    which allows for compilation to native ocaml and nodejs using a common
    interface.

I personally find the second option more convenient. With dune it has become easy to use a single repository to synchronise and release multiple libraries

Even though I never (never) (never) use Dune, ike @mseri I also find it nicer to release function in the form of a bunch of smaller libraries, or more precisely, in the form of a bunch of findlib packages… Dune users should probably skip this reply, b/c it won’t be useful to you.

For example, the pa_ppx package has a ton of findlib subpackages, like pa_ppx.deriving_plugins.{std,map,protobuf,...} . With not-ocamlfind reinstall-if-diff it is straightforward to

  1. break up a large project into a bunch of subdirectories, each of which provides some small number (typical one) of findlib packages
  2. build each subdir based on the assumption that previous subdirs have been built-and-installed into a local findlib repository" (e.g. TOP/local-install)
  3. install that subdir’s findlib-managed files to that repository using not-ocamlfind reinstall-if-diff so that if the rebuild didn’t actually change any files, then nothing gets installed
  4. just call build-and-then-install-locally on each subdir in dependency order

The net effect is that I can have directories A.B.C.D that have some nontrivial dependency-order, and can call build/install on them in that order, without incurring unneeded rebuilds. Then when it comes time to install the whole thing, I just reuse the “local install” step with a different target, to install into a global findlib repo.

This also means that when I write tests, I write them as if the package being tested is already installed globally.

2 Likes

Funnily enough, I was just discussing the need for this library with @tmattio and @rgrinberg in the context of the OCaml Platform VSCode plugin.

There are some opam plugins to do with tools bootstrap that would benefit from compiling both to native code, and also to Node via js_of_ocaml, so that they can be run either from the usual opam CLI or driven from the VSCode plugin directly. So I’d be delighted to see a small library around this for IO, and we’d be glad to give it a try to see if it works for the VSCode plugin and contribute any improvements back.

This also indirectly answers your question about big vs small libraries. It’s almost always more useful to have smaller libraries in opam that can compose together. OCaml’s a pretty good glue language for this purpose.

3 Likes

Interesting that there are people seeing similar requirements. The question is: What are the common io functions which can be implemented using node and ocaml?

I have a prototype where I have used a Haskell like IO monad as the interface (see signature for an IO environment as a primitive example here). I have implemented filesystem access functions via node and ocaml’s unix interface. This is possible, because node’s io functions are unix like. For a generic library I am considering to use Luv on the ocaml side. Since node is based on libuv it should be easy to find the common functions.

I don’t know what your requirements are. Could you sketch them in order to see, if there are synergies.

That sounds about right – the VSCode plugin essentially needs to install and bootstrap various tools (mostly opam and dune), so it’s a grab bag of operations involving spawning commands and reading/writing to paths. You can see a Bos/Unix based version here that is typical of the plugin commands: opam-tools/opam_tools.ml at master · ocamllabs/opam-tools · GitHub Nothing particularly monadic is needed; just encoding results using the Rresult convention is fine I think.

A good place to start is probably binding things on the Node side and pulling out a module signature. Implementing anything Node does in native code should be easy, but not the other way around.

I would also suggest smaller libraries. Sometimes I see a collection of libraries released and it always feels as too opinionated to me. While it is the way the author uses them, it’s unlikely that I would want to use them in this way.

One question: Suppose I have one repository with several opam packagas i.e. several *.opam files. I saw that e.g. cohttp makes it that way. Is it possible to make one opam release for all packages or do I have to make one release for each package?

Since every release in opam takes one or more days then a release of all packages might take more than one week if the packages have internal dependencies.

You can release them all at once. Usually dune-release and opam-publish take care of that for you. See for example [new release] decoders-yojson, decoders-cbor, decoders, decoders-jsonm, decoders-ezjsonm, decoders-sexplib, decoders-bencode and decoders-msgpck (0.6.0) by mattjbray · Pull Request #18324 · ocaml/opam-repository · GitHub (Which has just appeared on the opam-repository)

That is good news. However I do not yet understand how dependencies between different packages are handled well.

Let me state the problem I see:

  1. Each package of the set of packages needs an own entry (basically an opam file) in the opam repository.

  2. I can try to release the set of packages with one pull request.

  3. Some of the packages might depend on a version of another package in the set which is just to be released but not yet released in the opam repository.

Question: Does the ci use the packages to be released and not the packages already release when verifying the individual packages?

If the constraints are lax, then they may pick the one from the release or an earlier one, however this became quite messy so we agreed on the following convention. We version only the “root” package and make all the others depend on it with the {=version} tag: effectively you will be releasing a new version of the whole “bundle”, even if only some leaves are updated.

Thanks for the answer.

Does the same happen if I release the bundle manually i.e. if I put several *.opam files in my fork of the opam repository and make one pull request?

The ci automatically executed in the pull request builds all packages separately. Does it build the packages according to their internal bundle dependency? I.e. can I make one package of the bundle depend on the to be released version of another package of the bundle?

It depends on how you set it up, if you pin the packages, then it will run the tests using the local unreleased version. The way I usually do it is the following: ocaml-cohttp/workflow.yml at c3a59cd11fae2ccf084fbfc3eb02b75773511d25 · mirage/ocaml-cohttp · GitHub

Maybe we talk about different problems. If I make a pull request to the opam repository from my clone, I cannot influence how the opam repository builds my packages. I cannot pin anything. I just make the pull request and everything runs automatically.

Oh sorry. Yes the opam repository ci will pick the version of the packages you submitted

That’s great. Thank you for the information.