It’s also worth noting that there is significantly higher demand for data scientists, i.e. people who know how to apply existing algorithms to real-world problems, than for developers who know how to implement such algorithms.
Data scientists are typically not so invested in programming languages and hence go for the easiest languages to learn. That explains a lot about why Python and R became so popular in that field despite being arguably technically inferior on many language dimensions.
I always find it amusing when people complain about the lack of multicore support as an explanation for why OCaml is not seeing the adoption it would otherwise deserve, especially in data science. Python and R also lack multicore support on a language level and are at least 2 orders of magnitude more popular in the field. But whenever I cross-validate OCaml numerical code with existing Python code, I typically find that pure Python solutions run 2 orders of magnitude more slowly. Even encountering in excess of 3 orders is no shocker to me. Try to beat that by just throwing more cores at the problem!
Python data science frameworks merely unload computationally demanding tasks to external libraries written in something fast and parallel. And that actually works just fine for most practical problems. This can obviously be done with OCaml, too, but that’s not enough.
The issue with OCaml is that its greatest strength is also its greatest weakness: a powerful static type system. Given a reasonable set of libraries, many if not most practical data science problems can be solved within a few hundred lines of code. This is usually too small to really notice the benefits of static typing. But a data scientist trying to learn OCaml would quickly get annoyed by a “nitpicking” type checker. They feel more productive (even if they aren’t) when being able to just run and test stuff.
If I had to give honest advice to someone who wants to build a career in the field, I’d absolutely not recommend OCaml for data science, and this has nothing to do with the language. The library / framework ecosystem is currently not at the same level though I consider it adequate for solving most problems. The bigger barriers are non-technical (psychological / social / business) and will make it rather unlikely that OCaml will see sufficient adoption in that field in the foreseeable future.
But if you can’t resist, your best bet would probably be niche applications that play to OCaml’s strengths. E.g. dealing with highly structured, nested, non-uniform data or model specifications is typically way more difficult in mainstream languages. Not everything in life is adequately captured by tensors and feedforward neural networks. In fact, I believe machine learning research as a field somewhat neglects highly structured problems for that very reason. If you find a killer application for OCaml in that area, you will likely not see much competition for a long time. But that would be a high risk endeavor.