A Random Forests classifier in OCaml


I have started working on a Random Forests implementation in OCaml recently.

The classifier is supposed to work now.

Some caveats:

  • this is pretty slow (and I don’t know so much how to accelerate it).
    This is probably two orders of magnitude slower than sklearn ! Shame on me.
  • this is not super generic (integer class labels, int IntMap sparse features) only
  • there is no regressor yet (only the classifier)

If you know how to make this significantly faster, I am interested.

The interface file, implementation and test files might be of interest

I don’t claim this code is completely free from bugs.
I felt it was quite hard to write (or, I just wasn’t in my best programming shape).



2x slower than sklearn means 2x slower than a state of the art C implementation, no? isn’t that the kind of performance we should expect to be able to get at best?

I corrected this sentence: my implementation is about “two orders of magnitude” slower than sklearn.

ouch… in case someone figures out how to optimize this i would sure like to learn how to do that!