Re: education,
Languages like OCaml, C++, and Python all have lots of unnecessary hurdles for truly new students that your proposals aren’t going to address. At the introductory level, the research behind How to Design Programs is solid. They have the data to back up their claims, and the results speak for themselves. Students end up better off in a serious language having started off with a better conceptual foundation in a carefully designed training simulator.
If you want to make an ML-family version of that material, I’d be all for promoting it. Between the Racket and Pyret materials, you’d have a lot to draw on. And between OCaml’s ability to compile to JavaScript and the lp parsers, you shouldn’t need to make a full-on language as much as a custom syntax.
Similarly, I think there are a lot of upper level courses that are currently taught primarily in Java or C++ that would benefit from having OCaml-based lab materials, and there are at least a few where the most popular textbooks could benefit from competition against a high quality open textbook.
Re: changing OCaml syntax
In terms of language improvements, I’d rather see actual features like modular implicits or memory layout specifications.
All languages that get used for serious applications accumulate this stuff over the decades. Fixing it isn’t going to magically change the popularity of OCaml any more than teaching it in school will.
Any capable developer should be able to get up to speed in any language at at least a basic level level in short order. And it is very hard for me to imagine a scenario for a major project in which the on-boarding cost is dominated by the cost of someone learning the language and tooling.
If we get to the point where we can get faster parsing and have more flexibility for new features by cleaning this stuff up, then we can do what Rust and C++ have done and have an automated way to move large code-bases over to new reversions of the language while keeping full interoperability between files written to the older version.
But I don’t think “of” is anywhere close to that. As for the other proposals, Are you sure all of those things actually are equivalent internally? Both now and in terms of potential optimizations?
They may be semantically equivalent in some abstract / denotational sense, but one of the benefits of OCaml is that the language is transparent about what the abstractions cost. Theoretically clean syntax makes that level of operational transparency harder to maintain. So, at least some of this is in “feature not bug” land.
For anything that is purely syntactic sugar and truly vestigial, you could start by amending the documentation to make this clear. And then, for the benefit of those who want it gone, you can make a coding standard enforcement tool that automatically finds and fixes the old usage reliably. That’s the minimum that’s going to be required to even entertain the idea of deprecating the syntax in any event. So if this is high on your priority list, start by taking the necessary steps for yourself.
re: OCaml vs Python in industry
Most of the people using Python are using Python in the same way that accountants use Excel: as an application that runs some commands to give them a way to structure some kind of task.
It’s Lego blocks. And I’m not sure that OCaml will ever be able to compete with this because being a user-facing application is a different use case from being a systems programming language.
There are a lot of places where OCaml could get used more and there are lots of points of friction that could be fixed. But that’s got nothing to do with the syntax of the language and everything to do with the ecosystem.
So if this is about popularity, the concern over syntax is misplaced. People don’t love Rust because it has great syntax with orthogonal semantics. People love Rust because of the tooling. It is extremely easy to get setup for a serious project out of the box and to evolve a toy project into a major one.
You could help in this area by improving OCaml’s out of the box Windows experience, documenting how to run a large successful project in OCaml with all the bells and whistles, and making tooling to automatically set everything up. Out of the box testing, packaging, build system, code review, CI, and the rest would be great. You don’t need to reinvent the wheel here. OCaml has great tools. It’s just a matter of ergonomics and automation.
Walk through everything surrounding setting up and running a large project. Automate and standardize as much as possible. Look at what Rust and other languages have done. More time on task vs time spent fiddling with tools is a great selling point for anyone who wants to pitch a “new” language to management.
Similarly, add more documentation and reference materials. Real World OCaml is great. But there’s not enough beyond that. What do the different abstractions cost? What are the design idioms you’ll see in well written OCaml code? When are those idioms appropriate?
Julia benefited from having this stuff documented and widely shared very early in its life-cycle. Again, you can copy that. You could write the OCaml Recipes book and all the other stuff that makes it easier for a project to adopt some other language by just having stuff to point to.
The same goes for YT content. Nothing stops you or anyone else from making educational and professional development material. There are loads and loads of conference presentations not just for major languages like C++ and Python but even for young scrappy ones like Julia and Rust.
I would love to come on here and say “hey, everyone should be doing this stuff. It would help the language!” but the reality is that all of this stuff is a massive amount of mostly thankless work. And either someone is going to do it out of passion or because not having it has become enough of a problem for them that it is worth fixing.
Re: data science in particular
Most of the actual work on a data science program is actually getting and preparing the data. Once it’s properly cleaned and organized, you are just running numbers through some linear algebra (to vastly oversimplify) and probably by calling some low level library.
In principle, OCaml could be good at this. Data ingestion is mostly about parsing files after all. But someone would have to write the libraries and the tools and they’d have to be better than what’s out there by quite a bit to justify adding another bit of tooling over just writing a few more lines of Python or R.
But this isn’t exactly a great use case for strong typing because there just isn’t much in the way of type errors that can crop up. Similarly, I doubt OCaml would make for a good replacement for a lot of 20 line Perl scripts that are littered about every IT department on the planet. You are just dealing with text files and character streams.
In general, it’s probably a bad idea to try to make OCaml “the better X” for any X. Unless there’s some major problem that you can greatly simplify, people will just keep using the popular tools that everyone else does. Standardized jobs with known good solutions aren’t exactly ripe for disruption.
Re: the ultimate way to make OCaml more popular
The tooling and the ecosystem stuff I mentioned above will make using OCaml an easier business case. But ultimately, people use languages to get things done. And languages are usually successful because someone wrote code and tools that people are excited to use for their project because of how good they are and how much easier it makes things.
This ultimately comes down to actual research and being in the fortunate position of having a unique problem that isn’t a good fit for any existing solution.
Off the top of my head here are some ideas:
In terms of performance, there’s been some recent research that shows that many “enterprise” benchmark workloads that are usually thought of as not suitable for GPU acceleration due to thread divergence can be GPU accelerated if the right adjustments are made to the software. Can you make this automatic for OCaml code?
Relatedly, now that we have multi-core, can you make it easy to take advantage of nested data parallelism and other things that aren’t purely vectorized code?
Can you build a good library on top of algebraic effects that actually moves the ball beyond Rust and C++ in terms of reasoning about resource usage? (Not everything that gets written in those languages needs to be. It’s just that there is no high level language that helps you. Since OCaml is still a systems programing language, this stuff is the low hanging fruit.)
OCaml makes it easy to have high quality individual processes, but a modern application is a complex distributed system. However, it’s possible to have a software product that is end-to-end in OCaml from the OS on the various servers all the way through to the app on people’s phone or the code running in their browser. And there are going to be guarantees you’d like to make about that system, preferably without breaking out a separate model checker. Can you leverage the type system and the other tooling around OCaml to make this easier and more fool proof?
What about security guarantees in particular? This is particularly difficult to do in an ergonomic way. But since most security problems fit into patterns, it is in principle possible to automatically rule out large swaths of them. (A while back I saw a presentation from Microsoft that showed that their own tools were capable of catching ~80% of the bugs in Windows, but people didn’t consistently use the tools or ignored and overrode them.)
More generally, formal verification has come a long way and there are probably a lot of special cases for fixing 80% of some practical problem that doesn’t require full-on dependent typing. So there are lots of opportunities here.