Ocaml stdlib and death by a thousand papercuts

0xa2c2a · January 19, 2022, 11:31am

Status Quo

You roll your own standard library because the one bundled with the compiler is incomplete or exposes bad defaults(e.g: dune, opam, ocaml-lsp-server are the prominent ones).
You use Core/Base +Stdio (library maintainers can’t do that)
You use Containers
You use Batteries
Every solution here is slightly incompatible with each other that leads to another point.
- You need to pick one the bigger stdlib alternatives or roll your own and then pick a few other supporting libraries (eg: Fmt, Logs, hamt, vectors, <whatever>)
- You need some deriving mechanisms for showing and comparing data <insert other needs here>
- You need to choose between ppx_jane and ppx_deriving. And let’s not get started on Sexps divide.

All of this is stuff that beginners\newcomers\regular devs need to get through and find info on and it’s not a great dev experience

So my questions are:

Are there plans to evolve ocaml stdlib beyond what’s needed to write the compiler?
Are there plans to provide standard deriving mechanisms with ocaml?
What are the obstacles or objections?

octachron · January 19, 2022, 11:55am

Nowadays, the standard library is independent of “what’s needed by the compiler”. Typically, the compiler does not require the use of complex numbers yet. Similarly, the compiler has it own utility functions that are not part of the standard library (we have a strongly connected components module for instance).

The standard library is totally open to collaboration and enhancement, however its design process is mostly based on unanimous consensus which does slow down updates considerably.

For derivers, and many other topics, the plan is to let external libraries explore and solve those issues. As far as I know, the compiler developer hat does not grant omnicompetence. It seems thus much more sensible to let domain experts write library corresponding to their domain of expertise.

Personally, I don’t see the existence of choices as an issue.

bluddy · January 19, 2022, 12:17pm

I agree that this is an issue, specifically for beginners and those just taking a quick look at the language. The stdlib is improving gradually, but unless a specific committee is formed to work on it separate from the core dev team, not much is going to improve on that front.

My personal recommendation would be to use Containers unless you specifically need compatibility with Jane Street libraries, which is something that might be necessary for a commercial operation requiring battle-tested code.

cemerick · January 19, 2022, 2:38pm

The absolute worst possible outcome vis a vis “standard libraries” is if the one that comes with a language’s default toolchain is deeply flawed in some way(s) that cannot be recovered in userland. With this in mind, I’m very thankful that OCaml’s Stdlib has evolved deliberately to avoid accumulating grievous flaws or vestigial footguns. That’s not to say that Stdlib doesn’t have problems, just that it’s mostly a perfectly fine foundation for userland to build upon. Relative to the status quo in other languages, that’s saying a lot!

My personal experience is that, since starting with OCaml (again) three years ago, I spent way too much time being concerned about which stdlib to use than I should have. For application development, the easy answer is just “use them all if you like”; there’s absolutely nothing stopping you from picking and choosing bits from each of Stdlib and Base and Core and Containers and Batteries. (This in contrast to languages where you really can’t mix and match essential libraries, e.g. you have to choose just one prelude in haskell.) My largest application happens to depend heavily upon containers, and uses select bits from batteries and core. I’ve not run into anything like incompatibilities; it’s fine.

As for the rest, having a diversity of libraries to choose from is good. I wish there were 10x more ocaml libraries to choose from on every topic, not less, but that comes with time.

Leonidas · January 19, 2022, 2:41pm

I come to different conclusions than you do:

I would use Base/Stdio. Probably not Core(_kernel) but these have reasonably few dependencies these days that I find it feasible to use them. It is annoying though if I need something from Core and then it pulls in 20+ ppx_* packages.
I wouldn’t use Batteries at all. I don’t find the abstractions that Batteries adds to be particularly useful or great to expose to library consumers. Your code ends up with Bat* types for which you then need to use Batteries functions, so it doesn’t play too well with other libraries.

hyphenrf · January 19, 2022, 2:56pm

Would be nice if library choice didn’t influence binary size as much. (speaking from the application dev side of course).
I remember reading about a dead code elimination PR that split off to a separate tool, but my memory of it is hazy.

c-cube · January 19, 2022, 3:16pm

I’d like to mention that containers tries to not be incompatible with
the stdlib at all. However, there are some new types in it (e.g. Heap,
Vector) that will not be compatible with other alternative stdlibs’
equivalent features.

As for the standard deriving, I also wish the compiler could come with
the basic “deriving” plugins, just like Rust does (equality, comparison,
hashing, and printing). Alas the ppx world depends on ppxlib which is
not part of the compiler distribution.

cemerick · January 19, 2022, 3:25pm

I agree, though my personal level of concern is pretty low; having spent zero time attempting to minimize them, the applications I work on have executable sizes < 50MB, which is quite excellent IMO given the domain and relative to any other plausible language option. I don’t think binary size should motivate library choices though; the former is fundamentally a build tooling concern.

lucas-deangelis · January 19, 2022, 4:06pm

As a beginner in OCaml, I’ve only ever used the standard library. I like to see small quality of life improvements land from time to time. I feel like it’s “enough” for most of my usage (mostly CLI tools or web applications), but since Core, Base, Batteries and Containers exist, I think I may be missing something.

I’ve used other languages where the standard library is bigger and more opinionated, and I’m not sure how this would work with OCaml. For example, an HTTP server and client in the standard library doesn’t sound like the kind of thing anyone would want. On the other hand, something like the flag package from Go sounds small and low impact, and it really helps.

One idea that I’ve seen a few times here would be to make an “opinionated” OCaml distribution. It would be OCaml, distributed with a set of blessed libraries that would handle common tasks: HTTP requests, logs, database connection, cli arguments, testing, benchmark, maybe even GUI? This could be independent of the OCaml stdlib and allow some experimentation, and an easier environment for beginners. I think what’s offered by the Go standard library would be a good start, though we could have more containers easily since their library was made before generics.

The vision I have in my head is a bit like what’s made in the React world with create-react-app: you could have a tool to wrap around dune and opam and add all the libraries, and if you want you could opt-out (eject in create-react-app), and you’re left with a regular dune, opam and OCaml project.

For other issues like ppx_deriving_yojson vs ppx_yojson_conv, that may be where an ecosystem “guide” would help. From what I understand, ppx_yojson_conv doesn’t support custom conversion functions like ppx_deriving_yojson, so if you need one, the choice is already made.

Since 5.0 is coming, maybe now would be a good time to make a big “state of the ecosystem” survey, asking people what they use, what they need, referencing the different options, and seeing where more energy/help is needed. A survey was done in 2020 but I don’t see anything for 2021.

yawaramin · January 19, 2022, 6:25pm

Exactly. In fact I have a blog post sitting in draft for a while now where I wrote up something about this ‘blessed’ stack:

No ‘blessed distribution’ with standard set of libraries

This is I think the main reason people complain that OCaml ‘doesn’t have a big package ecosystem’. Yes, of course, OCaml is missing libraries for many SaaS offerings and other tools. But it actually has quite a respectable set of packages in the opam index.

It’s just that, for newcomers there’s no coherent story about what packages to use when trying to do something. If we look at the Python or Go ecosystem, the huge advantage their standard library gives them is that all the libraries you need for day-to-day work are right there in the standard library, maintained and documented: JSON, async, HTTP, dates and times, command line argument parsing, decimal numbers. In OCaml you’re left to sniff out what people consider the equivalent good libraries almost entirely by word of mouth.

So what can be done about this? I think what is needed is a ‘blessed distribution’ of standard opam packages, with guidance on the basics of using them. Of course, this may be a lot of work. But perhaps not as much as maintaining a giant library like the Python standard library: with opam’s dependency management capabilities, we can write an opam project file whose sole purpose is to list the ‘blessed’ set of packages as dependencies. This project can even contain some basic documentation about the chosen packages! Imagine a doc page with a table of contents like the Python standard library reference, installed along with all its packages just by doing opam install blessed (e.g.).

In fact, why stop there–because opam is source-based, it can take quite a long time to download and build this blessed set of packages. Imagine having the builds cached for common architectures and OSs already somewhere in the cloud–and not just that, but providing a single operating system package that would set up opam and installed the blessed package set in one shot. No more fiddling around with installing opam, initializing opam, setting up the switch, and installing packages. Instead, download a single binary installer with all needed tooling, copy the packages into place, and set up the toolchain near-instantly. This is probably doable right now–needs effort, of course.

Maelan · January 19, 2022, 6:33pm

Well, an all-set HTTP server is one extremely specific application indeed, but I don’t think that’s the kind of things people (or me at least) regret the most often. Friction with the stdlib starts at much lower level of sophistication. I’m not familiar with JaneStreet’s libs and perhaps Base goes quite far indeed but, to start with, all these stdlib extensions/replacements provide very basic things, such as sequences (Seq is a recent addition to the stdlib), extensible arrays, bitvectors, more generic IO infrastructure and whatnot. Plus, they can polish the edges of the standard library (e.g. having higher-order functions take labelled arguments ~f).

Even when you stick to standard data structures, my personal experience with the standard library is that, half a dozen times per programming session, I acknowledge the nonexistence of a function I was expecting (often by analogy with other functions).

[ For specific examples: for quite a long time there had been no List.init although there was Array.init (that one example stroke me many times); nowadays there is no Seq.init; there is no Array.init_matrix although there are Array.init and Array.make_matrix; there is no Set.S.to_rev_seq_from although there are Set.to_seq_from and Set.to_rev_seq; and the list keeps going on. Also, I’m always astonished that there is no function composition operation (even in the brand new module Fun!) and I can only assume that is a deliberate design choice? At the module level, until recently, the standard library didn’t have a systematic approach of datatype-centered modules and you had to define yourself trivial modules to be fed to functors. ]

Most often this is trivial code you can write yourself, but that’s what a library is about.

nojb · January 19, 2022, 8:00pm

This used to come up all the time in the caml-list back in the day. In short, one big problem with the composition operator is that due to the value restriction it “loses” polymorphism: even if let h x = f (g x) is polymorphic, let h = f \circ g won’t be since it is not generalized.

https://inbox.ocaml.org/caml-list/199812081702.SAA29130@pauillac.inria.fr/

Personally I find almost all uses of infix operators (other than the usual mathematical ones) to harm readability in large-scale programming.

Cheers,
Nicolas

cjr · January 19, 2022, 11:39pm

I’m absolutely certain that the lack of a “batteries-included” stdlib is off-putting to a number of potential OCaml users. But isn’t the downside exactly what @cemerick was saying? Flawed/incomplete implementations are added to the stdlib but can never be removed b/c people are depending on them. So another module is added. I don’t know Go, but one of the things that drove me away from Python was the constant churn of its stdlib.

cemerick · January 20, 2022, 1:26am

I think this is overstated. The set of languages that are actually “batteries included” (i.e. one requires no third-party dependencies to do one’s work) is vanishingly small. (Mathematica and Excel come to mind, but there are good and bad reasons for how they came to be as they are.) Certainly every nontrivial project in Python or Go ends up depending on some libraries for something, just like in every other general-purpose programming language. Just look at modern Java; even after a decade of (quite good) modernization work, and growing the standard library from huge to truly gargantuan, there remain dozens if not hundreds of libraries commonly considered to be essential.

This dynamic doesn’t come about because language X doesn’t have a large standard library, it happens because except for truly essential functions, the community ecosystem will always outpace any standard library in terms of being responsive to users’ needs (including concerns re: compatibility, ergonomics, style [functional vs imperative vs object-oriented vs monadic vs etc…], performance, whatever). Of course, everyone will disagree with what those “essential functions” are…but that’s just another tentpole of why a maintainer of a small standard library might reasonably shy away from any given suggested expansion.

Chet_Murthy · January 20, 2022, 1:56am

Indeed, just look: even with what you say being true, the standard library contains all sorts of stuff that arguably shouldn’t be in there. I mean … XSL processor ? XML parser? JDBC connection pool? Really? I worked inside the JVM and with Java-based enterprise products for over a decade (from the jump) and geez, when we found bugs in there (and there were indeed bugs regularly) it would be much harder to get a fix, than if the libraries were separate.

yawaramin · January 20, 2022, 2:15am

Exactly. And my suggestion of a ‘blessed’ set of packages addresses both of these concerns in one swoop.

cemerick · January 20, 2022, 2:49am

I didn’t get into the notion of package curation because it really is a whole different cat than the big-vs-little stdlib, and maybe even harder to skin.

Of course, anyone can do what you suggest, and to some extent, Jane Street already does a flavour of this by publishing core and everything else in their public portfolio. (The OCaml Platform effort is similar but focused on a much smaller bit of terrain.) But presumably that’s not sufficient, and so the blessings must come from e.g. the core lang dev team, or maybe some community-sourced committee? There lies a host of sticky political problems around whose libraries get selected and whose don’t, and what effect those selections have on the community dynamics of all of the candidate libraries (and those yet to come in each category).

Most of all though, I think the suggestion of “blessed” sets of packages begs the question that was asked originally: as soon as blessings are placed upon (for example) Core + N other JS libraries, wouldn’t a competing set of “blessed” packages built around e.g. containers + N other community libraries crop up? And then surely a minimalist Stdlib-focused collection would come about eventually. This sounds a lot like a bunch of sibling frameworks, which is great (just look at Dream and opium and eliom and…), but with the addition of all the politics that would go with having the curation done by someone with enough authority to effect the kind of PR simplicity that could cut through the “which stdlib?” noise.

yawaramin · January 20, 2022, 3:25am

Doing it correctly will require a lot of work, sure. But anyone can start doing it relatively easily by just compiling a list of known-good packages and writing a paragraph about each. OCamlverse and awesome-ocaml already do almost this. I think they just need a bit more structure. I’m imagining something more like the Python standard library’s table of contents page.

As to whose blessed set should be considered the blessed set? Who knows I think first there needs to be one before that question can be addressed.

bluddy · January 20, 2022, 7:40am

Having a blessed set of packages also helps synchronize libraries on a specified set of abstractions. For example, image libraries in OCaml currently create their own abstract type for images. But it would be far more useful for everyone to converge on Owl’s data type, similar to the way numpy's ndarray is used by the entire python world. This is close to having the benefit of the datatype in the stdlib. Basically, a blessed set of packages is very close to expanding the standard library.

mro · January 20, 2022, 7:57am

Hi @0xa2c2a,

thanks for caring and speaking up. To me OCaml is a tool rather than a religion. So if it doesn’t suit my needs, I don’t have to pick it. (Can’t speak for anybody else and surely not for the Janestreeties )

Second, I love OCaml for it’s trustworthyness. That comes from slowness and industry independence. (I left Go for that, avoided php and never touched node)

So the points you mention are all valid, but no showstoppers for the current community, I guess. And I wouldn’t want another one, even if it was for world domination.

Sustainability is way more important than growth. And that means to not disappoint the members in the first place.

A lot of people here publish opam packages, contribute to dune or give advice. They do so because they care and thus improve the ecosystem.

Would you blog about your experience and tell about it here? What finally worked, what didn’t? That helps a lot, both immediately for peers (another 1st hand experience report! There aren’t so much) and for the community to see the obstacles.

Topic		Replies	Views
What is the preferable solution for the role of standard library? Learning core , standardlibrary	37	9470	December 22, 2017
What I dislike about OCaml Community ocaml	117	11711	November 5, 2022
[ANN] v0.16 release of Jane Street packages Community announce	6	1709	June 28, 2023
Do you know of `StdLabels` and `MoreLabels` modules? Ecosystem stdlib , survey	7	402	June 20, 2024
Staying up to speed with OCaml in the year 2021 Learning	13	2018	February 6, 2021

Ocaml stdlib and death by a thousand papercuts

Status Quo

All of this is stuff that beginners\newcomers\regular devs need to get through and find info on and it’s not a great dev experience

So my questions are:

Related topics