Looking even closer, I see that this is done on JikesRVM. That’s a research toy, and nobody puts their best GC algorithms into research toys (I was at IBM Research, worked for the same folks). Furthermore, they didn’t compare their work against Bacon et al’s RC-GC (implemented in JikesRVM), AFAICT. I mean, if you’re going to put in explicit memory-management, I’d think you’d want to compare against RC-GC also, no? [Looks like they’re comparing against an Appel generational stop-and-copy collector]
Look: I get that researchers have to do “research”. But I know from experience that the amount of improvement that can be eked-out of even really awful starting-point GCs is quite substantial: I watched as guys improved the IBM [ETA: edited] product JVM’s GC considerably, using all sorts of low-level tricks.
ETA: I mean, what did we learn that we didn’t know already from work going back three decades? That explicit memory-management can use less memory than generational stop-and-copy GC? Check. That copying GC can be faster than explicit memory management? Check. That there’s a space-time tradeoff there? Check.
Barring some serious questions about the author’s taste in language design:
Similarly, Rust the language is quite pretty, syntax-wise. Meanwhile, OCaml is so ugly that the community came up with a whole other syntax for it.
I have to agree with the author’s complaint regarding the lack of effective support for metaprogramming in OCaml:
Macros in Rust are great! OCaml has PPXes, which are separate binaries that you build using the OCaml compiler toolkit. They have a very high barrier to entry, and I’ve never built one, and really struggled to even understand the ones I use.
Lexers and parsers are typically defined separately and connected by a token stream. This separate definition is important for modularity and reduces the potential for parsing ambiguity. However, materializing tokens as data structures and case-switching on tokens comes with a cost. We show how to fuse separately-defined lexers and parsers, drastically improving performance without compromising modularity or increasing ambiguity. We propose a deterministic variant of Greibach Normal Form that ensures deterministic parsing with a single token of lookahead and makes fusion strikingly simple, and prove that normalizing context free expressions into the deterministic normal form is semantics-preserving. Our staged parser combinator library, flap, provides a standard interface, but generates specialized token-free code that runs two to six times faster than ocamlyacc on a range of benchmarks.
We published a blog post that might be interesting to OCaml devs.
When working with large codebases such as Tezos Octez, it is important to make the code highly readable.
Discover “labelled type parameters” - a lesser-known OCaml trick used by Nomadic Labs devs to reach this objective: Nomadic Labs - Labelled type parameters in OCaml
I just finished reading this. Thank you for posting. Can I please please please suggest you make a front-page article about it? And maybe put the “Conclusions” in the post, to kickstart discussion? I have to say: I was surprised to learn that even UTF-32 is not fixed-width.
ETA: we should all read this article. All of us. And I basically never write anything user-facing. Still, I’mm glad I read this.
Note that this article is still an approximation, in particular extended grapheme clusters don’t really match human-perceived characters because they can’t account for ligatures. For instance, a font renderer may decide to render aesthetic as æsthetic. Similarly, some scripts have a quite complex text layout where segmentation into individual characters is subjective: how many characters in द्ध्र्य, ශ්ර, , or ﷺ ? Note that I can’t even predict the answer on your screen for the first two because it will depend on how well your system fonts support indic scripts, and typically on my system द्ध्र्य and द्ध्र्य are rendered with a different number of “characters”. And this is not even touching the issue of hieroglyphic control characters whose implementations are a work-in-progress … everywhere.