I’m happy to announce the release of my library
mula. The package uses Universal Levenshtein Automata (ULA) to not only check if a word is within a certain edit distance of another, but to also output what the edit distance is! It uses the automata themselves to calculate edit distances. A fun use case for this is that we can feed a set of words to the automaton and immediately rank the words by their edit distance.
Mula supports both the standard Levenshtein edit distance as well as the Demarau-Levenshtein distance which counts transpositions of two adjacent characters as a single edit. I also support getting live error counts, so you can feed part of a string into an automaton, and get the minimum number of errors that have occurred so far.
I currently have matching working using non-deterministic ULA, but I have partially started the work toward the deterministic versions. It should be possible to pre-compute the DFAs for up to edit distance 3 and pack it with the library, never needing to be recomputed because the Universal Automata are independent of the input strings. But the non-deterministic automata support very large edit distances: (Sys.int_size - 1)/2, so they have value on their own.
This library came about from a desire to add a “did you mean” feature to a toy compiler, but not wanting to write the kind of dynamic programming code that you can find in the OCaml compiler  or merlin/spelll [2,3].
- Edit distance in the OCaml compiler. ocaml/misc.ml at e5e9c5fed56efdd67601e4dbbaebeb134aee361c · ocaml/ocaml · GitHub.
- Edit distance in merlin. merlin/misc.ml at 444f6e000f6b7dc58dac44d6ac096fc0e09894cc · ocaml/merlin · GitHub
- Edit distance in spelll. spelll/Spelll.ml at 3da1182256ff2507a0be812f945a7fe1a19adf9b · c-cube/spelll · GitHub