OCaml Documentation Open Thread

It seems @jacquev6 has already implemented a Sphinx domain for OCaml. Maybe that could be useful?

1 Like

I have indeed. I used it to document ocaml-hashid if you want to have a look. This is a very short doc but I find it satisfying and the domain can do much more.

I also tried to implement an autodoc-like Sphinx directive (to extract doc from the code during the Sphinx build instead of generating .rst files beforehand), in the same project, but it is highly experimental. I would gladly hand it over to someone who know the OCaml compiler internals better than me.


I can see how this could be useful. What might be challenging is supporting cross-referencing in the same way odoc currently does. But I like the idea of delegating this work to existing tools. In particular GitHub already know how to render it.

On the other hand I would actually like to consider reStructuredText as an alternative to ocamldoc in the future. It’s more flexible than markdown and can be easily extended.


I would really like people to actually start writing documentation rather than endlessly ponder what will be the next half-broken solution to this supposedly problem.

The tools might not be perfect but they have been here and improving for as a long as I started publishing free software OCaml libraries and that was more than 10 years ago.

The problem is not the tools, the problem is as everywhere in computing: people hacking without documenting what they do.



Better tools are great of course, but even plain text files of good documentation beat elaborate libraries no one can use because there’s no documentation at all.

1 Like

We even have better than that.

We have the ocamldoc language. It allows to have your documentation exactly where it is needed, in mli files, and will carry itself if you move your sources around.

The ocamldoc language is a fully featured document markup language with sectioning and checked cross references to sections and API entry points.

Doing manuals and tutorial inside .mli always felt a bit cumbersome but is entirely doable as can be witnessed in many of my packages (see this one for example). The good news is that nowadays we will gradually be able to write those larger pieces in seperate .mld files. Code sample extraction is not there yet but I hope we can have a simple design implemented soon rather than having these discussions.

The other good news is that as a programmer, the effort you need to provide in order to make the documentation you write in this language available to end-users is this: approaching zero:

  • If you are using topkg or dune-release (assuming they didn’t kill that workflow), publishing your docs online on github is a topkg distrib && topkg publish doc away.

  • Distributing this documentation with your package releases is also automatically done if you use topkg+ocamlbuild or dune: these systems compile and install the right files which odig can then pickup in order to generate cross-referenced documentation for all the packages that are installed in your opam switch and allows you to peacefully read all this wondefully produced documentation offline with the stylesheet that suits you.

One might complain that the ocamldoc language syntax is none of these alternative document markup languages whose popularity or existence came after ocamldoc's birth but if you ever tried to write a polymorphic variant in a markdown code span you will be glad that ocamldoc’s language isn’t that one.

I understand the ocamldoc language is one more thing to learn as a newcomer but it’s not hugely complex, it’s there, it works with the advantages highlighted above and an aspiring OCaml working programmer shall meet it quite soon anyways since there are thousands of lines of documentation in hundreds of projects that are using it.


Yes, of course we do — my point was that writing documentation at all is more important than how you format it. (That said, given that documentation exists, it’s nice to have it well formatted.)

It would still be nice, I think, if there was a website where the docs for everything in OPAM were kept (and were linked to from the opam.ocaml.org site). That is because often you have no idea if you want to use something before reading the documentation, so having to install it first is a bit of a pain. As it stands, it should be pretty straightforward to do this automatically.


Publishing your own documentation is a bit of a band aid currently. Documentation that isn’t versioned like the packages themselves ends up creating its own set of problems. Cloning a package and checking out the appropriate tag still ends up being the most reliable approach to read accurate docs.

With topkg the watermarking process labels your documentation with version information. Not that it can’t be improved but at least you know what you are reading.

Not really, simply odig what you have installed.

With topkg the watermarking process labels your documentation with version information. Not that it can’t be improved but at least you know what you are reading.

Thanks for informing. I wonder if this is available to odoc generated docs yet.

Not really, simply odig what you have installed.

Disagree. I would like to read the docs for some packages before installing them. Installing an old package can be a highly destructive operation that can lead to build failures, downgrades, and wasted cycles. I would prefer to avoid it if at all possible.

Yes the watermarking process is orthogonal to tooling. It simply acts on text files.

It would certainly be nice to have versioned docs of a package but I think you are blowing this a bit out of proportions. Here are the two most common usage scenarios:

  1. You’d like to have a look at a library’s documentation before installing it. In that case you are unlikely to be interested by perusing an older version of the docs. The published version of the latest version online as published by topkg publish doc will do. It would be nicer to have that in a centralized place and would be perfectly doable now without too much effort (assuming only the latest version is published).

  2. You need to access the documentation of a library from a previous version. In that case it’s likely because you are constrained to use a previous version in your project and you will actually have it installed in the opam switch of your project and odiging will be perfectly fine.

This to say that in 99% of the cases the current status quo is mostly fine and I’d rather see odoc progressing on other front than on that particular problem.


(Sadly, discuss insists that my reply must be at least 20 chars long.)

The magnitude of this problem depends on personal patterns I suppose. I struggle with this issue quite a bit when trying to share or read documentation links that don’t change.

But I agree with you that odoc has bigger fish to fry. This problem can be easily solved as part of the opam package publishing process.

I was reading Using Core_kernel.List.mem. And why usingBase&Core instead of stdlib? and there was some discussion by @perry and @Yaron_Minsky about Jane Street docs (and Base/Core in particular). In particular, @perry had some thoughts: Using Core_kernel.List.mem. And why usingBase&Core instead of stdlib?

I wanted to drop a reference here and follow-up with some thoughts of my own.

To summarize:

  1. There is too much indirection from the entry-point of documentation to the modules of interest.
  2. For some important function (e.g. Hashtbl.create) the parameters are not documented.

I agree with (1), but also admit that this is likely an issue that can be solved by tooling (as @Yaron_Minsky mentioned). I tend actually not to agree that (2) is a major issue. That being said, a very closely related issue to (2) is understanding the use of first-class modules in Base/Core.

I find that people who are getting started with JS libraries have a difficult time getting a hold of the types they want (e.g. a Map) but once they know how to create one, they have no issue looking through the docs and figuring out what the different operations do (find, find_exn, etc.)

Some of how to use the first-class modules is explained in Real World OCaml (and this is often where I point people who ask me), but perhaps a more complete description would be useful.

Admittedly, some of this will become more approachable as the “indirection problem” is solved. The type (module Base__.Hashtbl_intf.Key with type t = 'a) would look a lot less scary if it looked like (module Base.Hashtbl.Key with type t = 'a).

The other question I get is: “Okay, I get that I have to pass a module as the argument here, and that the module has to satisfy the interface … but how do I get one of those modules?”

So, often people don’t realize that many of the standard types that you might want to use as the key type in a Map/Hashtbl are designed to satisfy that interface. For example, if I’m new and don’t know any better I may just go to https://ocaml.janestreet.com/ocaml-core/latest/doc/base/Base/String/ and then get discouraged because there’s no indication that this module satisfies the interface for Hashtbl keys.

Hope this is helpful, just trying to transfer the conversation back to this thread and add my 2 cents.

1 Like

I’m afraid that quoting @perry in extenso in this thread is adding uselessly a lot of stuff (a simple pointer would have been enough).
Moreover, regarding the discussion itself, can we consider it as a documentation (i.e. some information that helps to decide which stdlib is good for us, why and how tu use it) or as discussion exposing several standpoints?
Eventually, If there should be a real documentation for that, I’ve heard that it could/should be OCamlVerse.

EDIT: all your thoughts are of course very welcome :wink:

Point taken, I removed the quote and replaced it with a link.

I’m not sure how this thread has evolved (I haven’t kept up with all of it), but my understanding is that its purpose is to discuss as a community the state of documentation and how to improve it.

This tweet caught my attention because it perfectly applies to OCaml also. OCaml docs, like Haskell, are good at the “hard documentation,” and bad at the “soft documentation.” I’ve copied the soft documentation list from the slide below.

  • Problem definition
  • Installation/setup
  • Multiple examples
  • Prose: how, why, when and when not
  • Terminology explanation
  • Contribution guidelines
  • FAQs
  • Tutorials

Sometimes, sadly, they’re also bad at the hard documentation, but yes, this is a very useful distinction to make.

Soft documentation is essential for adoption, and for a library to gain more contributors. I think it’s easier to fix the hard documentation if you have proper soft documentation.