OCaml for Data Science

Hi. Sorry for late reply. I’m using OCaml for data science at work.

What makes OCaml good/bad for data science (long story short of your experience with OCaml in this area)?

  • (Good) OCaml is fast.
  • (Good) Static typing prevents many small bugs. For example, Python often shows me errors like not found key in dict after long-time computation, but OCaml finds them in compile time (when we use records).
  • (Bad) OCaml Libraries for machine learning are less than Python.
  • (Bad) OCaml cannot support multicore.

What the OCaml alternatives for Python’s Pandas, NumPy, SciPy, etc.?

As some people mentioned, owl is similar to numpy.

Do you know some frontier companies/products/projects that uses OCaml for data science?

I don’t know. I use OCaml for data science personally. However, my colleagues use their favorite languages, e.g., Java, Python, etc.

Is there any problems that are related to data science and was solved by other platforms, but not by OCaml as a platform?

Lack of libraries, multi-core support and scalable distributed-memory processing environments (I know some opam packages such as rpc_parallel, but I cannot find enough examples).

Maybe you may give me a good piece advise related to both OCaml and data science.

Jupyter (http://jupyter.org/) is very useful and it can execute OCaml code: OCaml Jupyter | An OCaml kernel for Jupyter notebook. A Docker image containing many packages for data science is available: GitHub - akabe/docker-ocaml-jupyter-datascience: Dockerfiles for data science in OCaml on Jupyter, and some examples are at docker-ocaml-jupyter-datascience/notebooks at master · akabe/docker-ocaml-jupyter-datascience · GitHub.
Please try them, if you are interested.

6 Likes