Interesting OCaml Articles

Heh. I remember when I started learning Golang (at Google). I was shocked and appalled at how terrible the language was. When I learned that an interface with N implementing concrete types, had N+1 distinct nil values, I (figuratively) puked. I wasn’t very impressed by the documentation and tutorials about Golang, either: they’re great for gently ascending the learning curve for complete newbies, but terrible for actually learning Golang to the point where you can actually, actually, ACTUALLY know what a piece of code does by reading it.

Maybe that’s where I’m coming down: I’m sure he’s a smart guy, but he’s channelling the experiences of his team of newbies. Which, sure, that’s what the commercial world is like. And for sure, for OCaml to be successful, which we should all wish for, that world has to be addressed, and pandered-to. It’s reality, as I learned (again, at the tip of the spear, as it cut into my innards) working with Java, the JVM, and large Java-based products, as it was initially commercialized.

[I remember the day that it became clear that all strings and character-oriented I/O would be via immutable Strings, and not using some UTF8-encoded collection of types. Hey, gotta pander to the idiots, it’s how we get paid. I’m cool with it.

I’m just thankful that OCaml’s core doesn’t mandate all of that rubbish, and I can build things properly.


I have recently bumped into a paper called The Case for Writing Network Drivers in High-Level Programming Languages. Author concluded that OCaml’s implementation is not very good in terms of latency (Figure 3a). Did anybody look into their implementation?

1 Like

That case was pretty conclusively proven in 1997 by Mark Hayden’s Ensemble (PhD thesis, Cornell). he achieved 10us overhead on 75us round-trip Myrinet running full virtual synchrony (which is a lot more than TCP).

If you’re interested in learning more, this paper was also discussed in this thread, where @Reperator, one of the authors, provided some additional context.

1 Like

Not specifically an OCaml article, but this paper Quantifying the Performance of Garbage Collection vs. Explicit Memory Management has people talking about Why Automatic Garbage Collection Is Expensive again.

It sparked an interesting Twitter exchange between the Emery Berger, Chris Lattner and our own @Yaron_Minsky.


Always surprising what gets published, I guess. Yaron is being generous and kind. It would seem, uh, blisteringly obvious that the reason I use a GCed language is that I’m willing to trade memory for programmer-hours. And this is a real cost: those programmer-hours can be used in other optimization work that can yield great rewards.

There was a (at the time) famous example of a Taligent application that drew four rectangles, and a trace revealed that (inexplicably) it did some massive number of printf-operations. These guys were funded as a startup to redo what NeXT did with Objective C, only with better resource utilization, b/c Objective C was, y’know refcounted, and we can do better with explicit memory-management.

The entire resurgence of O-O business application servers was made possible by the rise of GCed language runtimes: the Object Management Group and armies of programmers at big enterprise software companies invested billions of dollars in trying to build what became J2EE, doing it in C/C++, and failing miserably. I. Mean. Miserably. Bloated systems that were unstable as heck, with programmer-induced memory-leaks, memory-errors, and all manner of failure. When Java came out, programmers RAN from C++ to Java, because it was so difficult to write decent business-logic server systems in C++. That is to say, sure, with sufficiently skilled programmers, and a sufficiently limited and unchanging requirement-set, you can do it. But if you’re actually trying to run a business, you can’t afford that quality of programmer (even GMail is famously written in Java, not C++), can’t keep your requirements simple, and can’t keep them from changing.


Always surprising what gets published, I guess.

I disagree and find it unfair. What you describe is of course right, but at the time the article was published, explicit memory management that was safe and automatic had not emerged yet but was an active area of research. The debate on the thread was about tracing GC vs. prompt deallocation, not necessarily manual. I find the paper interesting, like Yaron’s/Stephen Dolan’s fine point, which is important to keep in mind when seeing the paper being quoted as “GC requires 4x as much memory”.


Looking even closer, I see that this is done on JikesRVM. That’s a research toy, and nobody puts their best GC algorithms into research toys (I was at IBM Research, worked for the same folks). Furthermore, they didn’t compare their work against Bacon et al’s RC-GC (implemented in JikesRVM), AFAICT. I mean, if you’re going to put in explicit memory-management, I’d think you’d want to compare against RC-GC also, no? [Looks like they’re comparing against an Appel generational stop-and-copy collector]

Look: I get that researchers have to do “research”. But I know from experience that the amount of improvement that can be eked-out of even really awful starting-point GCs is quite substantial: I watched as guys improved the IBM [ETA: edited] product JVM’s GC considerably, using all sorts of low-level tricks.

ETA: I mean, what did we learn that we didn’t know already from work going back three decades? That explicit memory-management can use less memory than generational stop-and-copy GC? Check. That copying GC can be faster than explicit memory management? Check. That there’s a space-time tradeoff there? Check.


Let me share my new blog post on understanding format6 with examples.

It’s almost my reading note for the paper Format Unraveled (on module Format) and experiments on utop. I tried not to be too verbose though.


Well, I made a sequel of format6 post,
Understanding format6 in OCaml by diagrams

This time I just use four examples with four diagrams e.g. it’s the one for Scanf.sscanf

p.s. It’s a pity that I missed Gabriel’s post The 6 parameters of (’a, ’b, ’c, ’d, ’e, ’f) format6 after writing that one.


Not primarily a programming article but I thought this is an interesting exception because it may be the first time OCaml has been mentioned in the Financial Times: Jane Street: the top Wall Street firm ‘no one’s heard of’  | Financial Times


However, OCaml is not given a very honorific qualification: a recondite programming language!

1 Like
1 Like

Barring some serious questions about the author’s taste in language design:

Similarly, Rust the language is quite pretty, syntax-wise. Meanwhile, OCaml is so ugly that the community came up with a whole other syntax for it.

I have to agree with the author’s complaint regarding the lack of effective support for metaprogramming in OCaml:

Macros in Rust are great! OCaml has PPXes, which are separate binaries that you build using the OCaml compiler toolkit. They have a very high barrier to entry, and I’ve never built one, and really struggled to even understand the ones I use.


Pretty and ugly is so subjective. I’m always surprised when programmers fall into that trap. It’s almost always a matter of what you’re used to.


You know their oracular memory management used traces to insert “free” as early as possible for a given input. So the programs benchmarked weren’t even necessarily correct in the general case!

1 Like

Too funny. Too funny.

1 Like

Hi, I would like to share my recent article about GADTs and state machines: GADTs and state machine

It’s another introduction about GADTs and it explains a bit what I did for Enjoy it and happy hacking!


An interesting paper that uses OCaml is by Francois Pottier, which gives a declarative DSL for implementing type rules with applicative functors. It has an associated library, opam - inferno.


Our @yallop is one of the authors of this paper about parsing.

flap: A Deterministic Parser with Fused Lexing

Lexers and parsers are typically defined separately and connected by a token stream. This separate definition is important for modularity and reduces the potential for parsing ambiguity. However, materializing tokens as data structures and case-switching on tokens comes with a cost. We show how to fuse separately-defined lexers and parsers, drastically improving performance without compromising modularity or increasing ambiguity. We propose a deterministic variant of Greibach Normal Form that ensures deterministic parsing with a single token of lookahead and makes fusion strikingly simple, and prove that normalizing context free expressions into the deterministic normal form is semantics-preserving. Our staged parser combinator library, flap, provides a standard interface, but generates specialized token-free code that runs two to six times faster than ocamlyacc on a range of benchmarks.