Weird (off by one?) behaviour of a trie tree I'm sketching out (still underperformant)

I’m working at a REPL completion/hinting feature for my programming language using the libraries ocamline and linenoise by keeping keywords that are in the environment inside a trie data structure.
It looks like it is working fine, except that it works only after an odd number of insertions. After the first two insertions in the trie the completion callback fails to retreive the last character of the keyword to complete, and after the fourth insertion it stops working completely ONLY AFTER EVEN NUMBERED INSERTIONS.
This is buzzing me a while and I don’t know where to inspect tbh. The trie is naive and keeps characters in its nodes. Here’s the code:


Here’s “how to reproduce”:
2020-01-22_01-31-55_scrot

You’re giving us a lot of code to look through. If you think the error is in your trie implementation, it might be worth looking for a way to reproduce the problem just using the trie itself and a smallish number of calls to the trie API. In fact if you do this you’ll probably see the problem yourself. [Smiling colon hypen parenthesis here.] But even if you don’t, it will make it much easier for others to think about what’s going wrong.

Two remarks:

  • there are existing trie libraries for OCaml, why not use one of them?
    (there is a trie module in opam, but I would naturally go for the trie module of Jean-Christophe Filliatre)
  • the type declaration for your trie looks odd, you have
  type t =
  | Start of t list
  | Stop
  | Node of char * t list

while I would expect something like

type 'a t =
| Leaf of 'a
| Node of (char * 'a t) list

(this is the general case where tries carry information and not just absence/presence of a word. If you know you will never need this, you can remove the type parameter and rename Leaf into Stop.

I think that the bug may go away if you rewrite your code with a better type definition: correct definitions encourage you to write correct code.

Another source of inspiration could be cmdliner’s trie which has similar aims, namely completion or hinting.

Actually looking at Daniel’s code immediately made obvious that my proposed type signature above is not quite right: you want to store values at each node of the trie, as a key may be a prefix of another. Next iteration:

type 'a t =
| Node of 'a option * (char * 'a t) list

Even in the case where you only want sets not maps (no information carried for each word in the trie), you need to turn the 'a option in a bool parameter to know if each prefix is in fact a word in your dictionary.

That’s a rather odd implementation, I’m not sure it is a prefix tree at all. A trie should be a tree, where each node has multiple children and optional data. Here is a simple implementation of an imperative trie that focuses on performance. Here are the docs. Feel free to use it.

Thank you everybody for the suggestions. I’ve managed to make a working draft with an immutable trie. I’ll focus on optimizing it in the next days. Thanks especially to @gasche for the proposed signature, which is what I’ve used. If you want to try it I’ve pushed to the completion branch of gobba. Do you know any good micro benchmarking library that I can use to test my language? Something that produces already plotted charts or easily plottable data.