Weird (off by one?) behaviour of a trie tree I'm sketching out (still underperformant)

0x0f0f0f · January 22, 2020, 12:42am

I’m working at a REPL completion/hinting feature for my programming language using the libraries ocamline and linenoise by keeping keywords that are in the environment inside a trie data structure.
It looks like it is working fine, except that it works only after an odd number of insertions. After the first two insertions in the trie the completion callback fails to retreive the last character of the keyword to complete, and after the fourth insertion it stops working completely ONLY AFTER EVEN NUMBERED INSERTIONS.
This is buzzing me a while and I don’t know where to inspect tbh. The trie is naive and keeps characters in its nodes. Here’s the code:

github.com

0x0f0f0f/gobba/blob/completion/lib/completion.ml

(** This module provides the completion callback for the linenoise REPL *)

open Util

let implode cl = String.concat "" (List.map (String.make 1) cl)

let explode s =
  let rec exp i l =
    if i < 0 then l else exp (i - 1) (s.[i] :: l) in
  exp (String.length s - 1) []

module Trie = struct
  type t = Start of t list | Stop | Node of char * t list [@@deriving show]

  let empty () = ref @@ Start []

  let rec tree_of_path path = match path with
  | [] -> Stop
  | x::xs -> Node (x, [tree_of_path xs])

This file has been truncated. show original

github.com

0x0f0f0f/gobba/blob/completion/lib/repl.ml

open Types
open Errors

(** Parse and evaluate a gobba expression from a string, in a default state,
returning a couple consisting of an evaluated value (of type evt), that you can
unpack using the [Gobba.Typecheck.unpack_<name>], and the new state resulting
from the command evaluation. This is useful for embedding gobba in your ocaml
programs.
@param str The gobba string to parse and evaluate
@param dirscope A directory path: where gobba will look for files imported with a relative path. Defaults to the current directory.
@param state Specifies the current computation state with options. Uses the default state (with an empty environment) if not provided
@return A couple containing the result of the expression evaluation and the resulting state.
*)
let run_string str ?(dirscope=(Filename.current_dir_name)) ?(state=default_evalstate) () =
  let ast = Parsedriver.read_one str in
  Eval.eval_command (List.hd ast) state dirscope


(* A list of pairs of directives and their completion hints *)
let directives = [

This file has been truncated. show original

Here’s “how to reproduce”:
2020-01-22_01-31-55_scrot

jeffsco · January 22, 2020, 5:23am

You’re giving us a lot of code to look through. If you think the error is in your trie implementation, it might be worth looking for a way to reproduce the problem just using the trie itself and a smallish number of calls to the trie API. In fact if you do this you’ll probably see the problem yourself. [Smiling colon hypen parenthesis here.] But even if you don’t, it will make it much easier for others to think about what’s going wrong.

gasche · January 22, 2020, 9:44am

Two remarks:

there are existing trie libraries for OCaml, why not use one of them?
(there is a trie module in opam, but I would naturally go for the trie module of Jean-Christophe Filliatre)
the type declaration for your trie looks odd, you have

  type t =
  | Start of t list
  | Stop
  | Node of char * t list

while I would expect something like

type 'a t =
| Leaf of 'a
| Node of (char * 'a t) list

(this is the general case where tries carry information and not just absence/presence of a word. If you know you will never need this, you can remove the type parameter and rename Leaf into Stop.

I think that the bug may go away if you rewrite your code with a better type definition: correct definitions encourage you to write correct code.

dbuenzli · January 22, 2020, 10:30am

Another source of inspiration could be cmdliner’s trie which has similar aims, namely completion or hinting.

gasche · January 22, 2020, 10:46am

Actually looking at Daniel’s code immediately made obvious that my proposed type signature above is not quite right: you want to store values at each node of the trie, as a key may be a prefix of another. Next iteration:

type 'a t =
| Node of 'a option * (char * 'a t) list

Even in the case where you only want sets not maps (no information carried for each word in the trie), you need to turn the 'a option in a bool parameter to know if each prefix is in fact a word in your dictionary.

ivg · January 22, 2020, 1:39pm

That’s a rather odd implementation, I’m not sure it is a prefix tree at all. A trie should be a tree, where each node has multiple children and optional data. Here is a simple implementation of an imperative trie that focuses on performance. Here are the docs. Feel free to use it.

0x0f0f0f · January 22, 2020, 2:16pm

Thank you everybody for the suggestions. I’ve managed to make a working draft with an immutable trie. I’ll focus on optimizing it in the next days. Thanks especially to @gasche for the proposed signature, which is what I’ve used. If you want to try it I’ve pushed to the completion branch of gobba. Do you know any good micro benchmarking library that I can use to test my language? Something that produces already plotted charts or easily plottable data.

Topic		Replies	Views
Interactive OCaml development with utop in Emacs Learning	5	3077	July 13, 2020
[ANN] rtree 0.1.0 Community	6	707	August 9, 2023
Simple Char Trie - Return Original Strings Learning	1	676	January 11, 2019
[ANN] ocaml-lsp-server 1.8.0 Community announce	4	1382	September 11, 2021
[ANN] New release of Monolith Community	0	141	November 29, 2024

Weird (off by one?) behaviour of a trie tree I'm sketching out (still underperformant)

Related topics