The current representation of strings does not make it possible to take a substring in constant time. Changing the representation to add an extra indirection would allow this, but it would incur a small additional performance cost to all existing string operations; it is not clear that it is worth it.
Some libraries propose a “substring” type that is designed for this (quite simply, a substring is a tuple of a string, an offset and a length), that you can use in the part of your algorithm that requires constant-time String.sub
. For example Batteries has a BatSubstring module.
Going in that direction, you may want to consider weirder representations, in particular “ropes”, that improve the algorithm cost of more operations (concatenation in O(log n)), at the cost of making random access non-constant-time (typically O(log n)). ocaml-rope is one implementation of ropes. Again, the extra complexity in the structure incurs an extra overhead on all operations (typically higher than for “just” substring), so this is only worth it if your code is really sensitive to the complexity of those operations.
In many cases using just strings, and Buffer
for repeated-appending operations, is the fastest option. In some cases it is helpful to keep a tree of strings around, to be concatenated at the end.