I would say, Julia still wins by momentum in the numerical computing space by a factor exceeding 20. Owl is a heroic effort trying to change that but to get viable will require gradual improvement based on feedback from a critical mass of regular users. It will not have the manpower to tackle a full GPU autodiff implementation, afaics.
That’s an easy question to answer: how about you implement this “full stack OCaml GPU DL toolchain” you would like to have? See, there is your answer!
Implementing a comprehensive AD framework is a hell lot of work. And then what? There is hardly any commercial demand for OCaml, especially for machine learning. If there is no professional perspective for such an undertaking, why bother?
Check this chapter under the ONNX Engine section. In a nutshell you need to save your computation in the onnx format then you can load and run it via the python onnx runtime. I believe this can be automated and done from within an OCaml environment via the pyml library.
I think what can be also done to make Owl work with GPUs is that since Owl already has the functionality of producing symbolic graphs of computations, it should be possible to write another engine that can target MLIR. Another possibility is to write an implementation of the Ndarrays based on the arrayfire library. Yet another possibility is to utilize spoc somehow.
OpenCL is woefully unused in the industry and lags far behind CUDA. As in other domains, one needs many specialists to develop this stuff.
Additionally, OCaml doesn’t have that much of an advantage here unfortunately. The biggest source of bugs and errors in this domain has to do with tensors and their dimension sizes, and there are just no good type systems that handle this stuff outside of dependent types AFAIK.
Additionally, the industry advances extremely rapidly. Bindings to pytorch and tensorflow are probably the best we can do.
The main opportunity I see here for OCaml is in managing the processing of structured data. Python is superb for deep leaning with arrays and/or strings going in and out but grim for all other data types. OCaml could really excel here.
I’m a bit worried that with pytorch 2.0, pytorch is stating that they are moving away from C++ and rewriting more of the core components in python. Apparently there are ways to do that that preserve performance. That would potentially make the C++ / C bindings to pytorch incomplete / obsolescent. I hope I’m wrong?