OCaml for Data Science

Absolutely, I do agree that even small programs benefit from static typing.

The problem is cognitive myopia among beginners. Their initial lack of programming or language competence will cause them to run into typing errors too frequently, and they also may not have the ability yet to quickly understand what the compiler is telling them. Furthermore, testing and fixing bugs feels more productive to beginners than thinking carefully about a compiler message, which might guide them to their solution more efficiently. It’s maybe also an ego thing that people don’t like being told by machines that they suck at programming.

That’s why many give up too early and believe that static typing “gets in your way”, which is among the most frequent objections to OCaml I have heard. They don’t see themselves as becoming more proficient at using the type system. In their mind having to deal with these issues for small programs means programming with static types must be an even worse experience with big programs even though the opposite is the case. It’s hence of little surprise that languages that cater to their prejudices are more popular.

I think the level of competence where people really get sold on modern static type systems is when they start explicitly designing their programs to leverage the type checker, but this takes typically at least months of experience. I once saw a beginner switch from sum types to matching characters based on the reason that this way the compiler would complain less about their code! It’s sort of ironic that expert users do the exact opposite, i.e. write programs such that the compiler will scream at them as often as possible. I guess it’s not just an intellectual preference, but programming that way may require a certain level of emotional resilience.

14 Likes

In my personal experience (coming from python), the compiler messages weren’t that off-putting. Learning to deal with types was compensated by the ‘shiny new toy’ effect.
I will say that when trying something out in OCaml I usually need to spend some time first thinking about the right types. This is a preliminary design step that takes some experience. I now think that it actually helps me structure the problem, but it can feel slow to get started with something new compared to python. I imagine people may find this offputting.
The biggest problem continues to be availability of (bindings to) libraries.

1 Like

I guess the fact that you are posting about OCaml is proof of some degree of self-selection. A naturally high degree of exploratory behavior is surely helpful in overcoming initial obstacles. In my experience such curiosity is sadly in the minority.

Python and R have a richer library ecosystem, but that begs the question: why? It’s not like OCaml hasn’t been around for decades already. My best guess for an explanation is the initially steeper learning curve imposed by the type system that drives away most newcomers towards easier languages. Once they start implementing more and more libraries in those, the snowball effect takes over.

2 Likes

I’m thinking you’re right. I hesitate to recommend OCaml to colleagues because I feel it’s not a good idea to start learning it if you need to get something done quickly. I still hope for the snowball effect going forward.

It would be great if OCaml could tap into the enormous expansion of Julia libraries. What would it take to call into Julia? They have a type system too, would this be a chance to get efficient cross-language communication?

2 Likes

I don’t disagree with these points, but as someone who’s also a fan of Lisps, I think there are benefits to dynamically typed languages even when you get all of the types right–namely that you can mix types more freely when it makes your code simpler. There are dangers there, but there are tradeoffs to everything. OCaml is naturally attractive to people who weight the tradeoffs in one direction rather than the other; I’m not trying to convince anyone here to prefer Python or Clojure, etc., and OCaml is my current language of choice. I just want to point out that the reasons that people are attracted to dynamically typed languages are not only because it allows them to make type errors during initial programming. :slight_smile:

I think that’s especially true of exploratory code. When you want stability and resilience, the safety of types tend to dominate.

This is one thing I’ve realized recently: all of software engineering makes assumptions about the end product – specifically, that you’re striving for a mostly stable artifact. The artifact needs to evolve, but the majority of the code needs to be stable. This isn’t necessarily the case for data science and experimentation, and many aspects seen as positive in software engineering (such as DRY) can become liabilities. To some degree this is true about types as well.

3 Likes