OCaml runtime: hash cons‘ing for strings?

Since OCaml‘s strings are now immutable, could the runtime system offer optionally to maintain a pool of strings that maximizes sharing? Every new string would be checked to already exist in the pool and an existing one would be reused. The pool would be a weak set to facilitate garbage collection. Obviously this trades performance for memory because it requires a lookup when a string enters the runtime system. Hence, it should be optional. Has this been considered?

You can quite easily do it manually with Weak.Make, I think. I’m not sure what the benefit to it for every string in a program would be, though, sounds like it would slow down things a lot.

I’m experimenting with this in a large application in a central place and that triggered the idea because it’s more difficult to do it everywhere.

The slowdown part depends on your usage actually: if you create most strings at the beginning and then mostly compare them, then hashconsing strings can actually be a huge benefit because comparison, or at least equality can be done insanely fast by using physical equality (so a single instruction comparing the address of the hashconsed strings), and actually, hashconsing strings is a technique regularly used in theorem provers which fit that scenario.

But in the general case, it is indeed not clear what the performance of hashconsing all strings would have, and since it’s relatively easy to do as a library (e.g. https://github.com/backtracking/ocaml-hashcons/blob/master/hashcons.mli , but I’m sure there are plenty others), I’m not sure it would be useful to add complexity to the runtime for it.