OCaml non orthogonal featrures (was Simplification of OCaml as a design goal?)

MaxHaydenChiz · July 26, 2022, 1:05am

Re: education,

Languages like OCaml, C++, and Python all have lots of unnecessary hurdles for truly new students that your proposals aren’t going to address. At the introductory level, the research behind How to Design Programs is solid. They have the data to back up their claims, and the results speak for themselves. Students end up better off in a serious language having started off with a better conceptual foundation in a carefully designed training simulator.

If you want to make an ML-family version of that material, I’d be all for promoting it. Between the Racket and Pyret materials, you’d have a lot to draw on. And between OCaml’s ability to compile to JavaScript and the lp parsers, you shouldn’t need to make a full-on language as much as a custom syntax.

Similarly, I think there are a lot of upper level courses that are currently taught primarily in Java or C++ that would benefit from having OCaml-based lab materials, and there are at least a few where the most popular textbooks could benefit from competition against a high quality open textbook.

Re: changing OCaml syntax

In terms of language improvements, I’d rather see actual features like modular implicits or memory layout specifications.

All languages that get used for serious applications accumulate this stuff over the decades. Fixing it isn’t going to magically change the popularity of OCaml any more than teaching it in school will.

Any capable developer should be able to get up to speed in any language at at least a basic level level in short order. And it is very hard for me to imagine a scenario for a major project in which the on-boarding cost is dominated by the cost of someone learning the language and tooling.

If we get to the point where we can get faster parsing and have more flexibility for new features by cleaning this stuff up, then we can do what Rust and C++ have done and have an automated way to move large code-bases over to new reversions of the language while keeping full interoperability between files written to the older version.

But I don’t think “of” is anywhere close to that. As for the other proposals, Are you sure all of those things actually are equivalent internally? Both now and in terms of potential optimizations?

They may be semantically equivalent in some abstract / denotational sense, but one of the benefits of OCaml is that the language is transparent about what the abstractions cost. Theoretically clean syntax makes that level of operational transparency harder to maintain. So, at least some of this is in “feature not bug” land.

For anything that is purely syntactic sugar and truly vestigial, you could start by amending the documentation to make this clear. And then, for the benefit of those who want it gone, you can make a coding standard enforcement tool that automatically finds and fixes the old usage reliably. That’s the minimum that’s going to be required to even entertain the idea of deprecating the syntax in any event. So if this is high on your priority list, start by taking the necessary steps for yourself.

re: OCaml vs Python in industry

Most of the people using Python are using Python in the same way that accountants use Excel: as an application that runs some commands to give them a way to structure some kind of task.

It’s Lego blocks. And I’m not sure that OCaml will ever be able to compete with this because being a user-facing application is a different use case from being a systems programming language.

There are a lot of places where OCaml could get used more and there are lots of points of friction that could be fixed. But that’s got nothing to do with the syntax of the language and everything to do with the ecosystem.

So if this is about popularity, the concern over syntax is misplaced. People don’t love Rust because it has great syntax with orthogonal semantics. People love Rust because of the tooling. It is extremely easy to get setup for a serious project out of the box and to evolve a toy project into a major one.

You could help in this area by improving OCaml’s out of the box Windows experience, documenting how to run a large successful project in OCaml with all the bells and whistles, and making tooling to automatically set everything up. Out of the box testing, packaging, build system, code review, CI, and the rest would be great. You don’t need to reinvent the wheel here. OCaml has great tools. It’s just a matter of ergonomics and automation.

Walk through everything surrounding setting up and running a large project. Automate and standardize as much as possible. Look at what Rust and other languages have done. More time on task vs time spent fiddling with tools is a great selling point for anyone who wants to pitch a “new” language to management.

Similarly, add more documentation and reference materials. Real World OCaml is great. But there’s not enough beyond that. What do the different abstractions cost? What are the design idioms you’ll see in well written OCaml code? When are those idioms appropriate?

Julia benefited from having this stuff documented and widely shared very early in its life-cycle. Again, you can copy that. You could write the OCaml Recipes book and all the other stuff that makes it easier for a project to adopt some other language by just having stuff to point to.

The same goes for YT content. Nothing stops you or anyone else from making educational and professional development material. There are loads and loads of conference presentations not just for major languages like C++ and Python but even for young scrappy ones like Julia and Rust.

I would love to come on here and say “hey, everyone should be doing this stuff. It would help the language!” but the reality is that all of this stuff is a massive amount of mostly thankless work. And either someone is going to do it out of passion or because not having it has become enough of a problem for them that it is worth fixing.

Re: data science in particular

Most of the actual work on a data science program is actually getting and preparing the data. Once it’s properly cleaned and organized, you are just running numbers through some linear algebra (to vastly oversimplify) and probably by calling some low level library.

In principle, OCaml could be good at this. Data ingestion is mostly about parsing files after all. But someone would have to write the libraries and the tools and they’d have to be better than what’s out there by quite a bit to justify adding another bit of tooling over just writing a few more lines of Python or R.

But this isn’t exactly a great use case for strong typing because there just isn’t much in the way of type errors that can crop up. Similarly, I doubt OCaml would make for a good replacement for a lot of 20 line Perl scripts that are littered about every IT department on the planet. You are just dealing with text files and character streams.

In general, it’s probably a bad idea to try to make OCaml “the better X” for any X. Unless there’s some major problem that you can greatly simplify, people will just keep using the popular tools that everyone else does. Standardized jobs with known good solutions aren’t exactly ripe for disruption.

Re: the ultimate way to make OCaml more popular

The tooling and the ecosystem stuff I mentioned above will make using OCaml an easier business case. But ultimately, people use languages to get things done. And languages are usually successful because someone wrote code and tools that people are excited to use for their project because of how good they are and how much easier it makes things.

This ultimately comes down to actual research and being in the fortunate position of having a unique problem that isn’t a good fit for any existing solution.

Off the top of my head here are some ideas:

In terms of performance, there’s been some recent research that shows that many “enterprise” benchmark workloads that are usually thought of as not suitable for GPU acceleration due to thread divergence can be GPU accelerated if the right adjustments are made to the software. Can you make this automatic for OCaml code?

Relatedly, now that we have multi-core, can you make it easy to take advantage of nested data parallelism and other things that aren’t purely vectorized code?

Can you build a good library on top of algebraic effects that actually moves the ball beyond Rust and C++ in terms of reasoning about resource usage? (Not everything that gets written in those languages needs to be. It’s just that there is no high level language that helps you. Since OCaml is still a systems programing language, this stuff is the low hanging fruit.)

OCaml makes it easy to have high quality individual processes, but a modern application is a complex distributed system. However, it’s possible to have a software product that is end-to-end in OCaml from the OS on the various servers all the way through to the app on people’s phone or the code running in their browser. And there are going to be guarantees you’d like to make about that system, preferably without breaking out a separate model checker. Can you leverage the type system and the other tooling around OCaml to make this easier and more fool proof?

What about security guarantees in particular? This is particularly difficult to do in an ergonomic way. But since most security problems fit into patterns, it is in principle possible to automatically rule out large swaths of them. (A while back I saw a presentation from Microsoft that showed that their own tools were capable of catching ~80% of the bugs in Windows, but people didn’t consistently use the tools or ignored and overrode them.)

More generally, formal verification has come a long way and there are probably a lot of special cases for fixing 80% of some practical problem that doesn’t require full-on dependent typing. So there are lots of opportunities here.

MaxHaydenChiz · July 26, 2022, 2:07am

I’ve never actually given serious consideration to using OCaml for numerical computing. I just don’t see how the kinds of problems that arise are going to be helped by the type checker.

But, I think the suggestion to take a body of code and try to rewrite it in OCaml is actually a good one. That’s a great way to find points of friction and to come up with new research ideas. And “use things in ways not originally contemplated” is a generally good heuristic.

With numeric code, maybe there is some type-theory thing that would really make my life easier by automatically handling lots of floating point corner cases. Or at least yelling at me before my long expensive computation crapped out due to some corner case I forgot to test.

And maybe there’s some ideas beyond modular implicits that would make doing data science-y stuff easier without having to give up static typing.

Maybe for other types of code, you’ll find something similar. Take some telecom system written in Erlang. Or some complex wait-free data structure thing in Rust or C++. Or some web app. Or something else that OCaml ought to be able to do well and where you think type theory could actually improve productivity and reliability. Then write up the OCaml code and figure out how to make it actually ergonomic.

(I focus on type theory b/c that’s what OP’s background is. But the same rule applies for anyone else wanting to apply their thing to make OCaml better.)

P.S. Re: python being complex. I am convinced it is actually the most complicated language out there right now. Far more than C++. Makes GHC look like child’s play. Absolutely terrifying to be running a bunch of critical code without having a good way to efficiently understand all of the crazy things going on to make that Jenga tower not fall over.

Chet_Murthy · July 26, 2022, 3:27am

Truth be told: I don’t know myself, because I haven’t written enough numerical code (yet). But, some (possibly erroneous) data-points:

Different eigensolver implementations produce results in different forms: typically matrices are row-major, but the ones in numpy/scipy (e.g. scipy.linalg.eig) present the eigenvectors as the columns. By contrast, PySCF’s Davidson eigensolver (pyscf.lib.linalg_helper.davidson) presents the eigenvectors as the rows – more precisely, it returns a list of eigenvectors, but that’s very similar to a matrix, where the rows are the eigenvectors.

Gosh, that’s confusing. Reminds me a lot of how in ML-like languages, we can use abstract data-types to help us not mistake a list for a stack, etc, etc.

My experience with Rust is that in fact, type-checking is wildly effective at helping to control the complexity of the code. Have a look at the sprs package (sparse matrices): types all over the place, and they really do help make the code more comprehensible, and prevent you from making mistakes.

Now, to back up:

take a body of code and try to rewrite it in OCaml

Nobody learns about how useful a language is in a new area, until somebody steps up and does this, gets their knuckles skinned, tries again, etc, etc, and finally produces something useful. It’s a learning process, going thru the pain.

And something else: I moved from PL/type theory to industrial software, working in transaction-processing with FIRE (financial, insurance, real estate) customers for IBM. As old-skool as it gets, unless you’re writing nuclear bomb simulations. And one of the things I learned, was that PL isn’t good for anything by itself: its utility comes about as methodology for solving real problems. And so, if your PL knowledge is good, you’re going to be better than the next guy at solving problems in the domain to which you apply yourself.

And if you’re lucky and have the time and freedom, your experience of working in that domain will feed back into your PL work. More people should tackle these other-domain problems, and report back on what they learn: specifically, on the bruises, scrapes, and puncture wounds they suffer. It’ll help OCaml.

Chet_Murthy · July 26, 2022, 5:11am

Can I just say, quite to the contrary. I’ve written a ton, and I. Mean. A. Ton. of perl5 code. The 1996 Atlanta Olympics app-server was 100% O-O perl5. I wrote an object-relational mapper for Java<->DB2, in Perl. I wrote a significant part of an aspect-oriented code-injection toolkit in Perl5. And tons of other stuff, all over the place. Systematically, I think about Perl as dynamically-typed Scheme, and I am always aware of the types. So why don’t I write in OCaml? Because Perl’s support for strings and regular expressions is wildly superior. That, and I don’t have to write down types. I have a dream of writing a “progressive typing” system for Perl, that would take in Perl, and output OCaml. And again, I’m a massive Perl bigot, always choosing Perl first when I have a problem to solve.

I really do feel strongly: don’t assume that you can’t use strongly-type languages for these tasks; instead, ask why strongly-typed languages aren’t as labor-efficient, and how to change that.

This is another way of saying the same thing I said in my other note: when we decide that OCaml is unsuited for some particular application, that’s a self-fulfilling prophecy. Please, don’t let yourself get caught up in that.

Most of the people using Python are using Python in the same way that accountants use Excel: as an application that runs some commands to give them a way to structure some kind of task.

This is exactly right! Somebody, a long time ago, wrote tools like numpy and scipy, and because they required little up-front learning to use, they were able to teach lots of newbies how to use them. This, by the way, is how Javascript got so pervasive: I remember in 2001, that people in the industry were talking about how JS was the way in for newbies with little CS knowledge. Similarly, I’ve heard a story that the guy who started Node.JS, started it because he knew JS, and wanted to write servers. From small beginnings, is what I’m saying.

MaxHaydenChiz · July 26, 2022, 5:55am

I don’t think we fundamentally differ about Perl here. In an ideal world, our type inference would be so good and the trade offs so few that that you’d always be able to chose a strong, statically typed language.

My statement was more about where we are right now and where the best leverage points are.

Reduce friction across the board. And come up with new use cases that fundamentally change how some problems can be solved for the better.

I think the OCaml community as a whole has done a wonderful job in recent years. And I hope to see that continue to improve. I’d love to be able to use ocaml in more projects and it has already gotten easier to do so. But, generally speaking, old languages don’t get replaced. New code for new problems gets written and gradually becomes bigger and more prominent. Old code shrinks relative to the total. It doesn’t go away and still grows, just at a much slower rate.

fccm · July 26, 2022, 3:26pm

What made Python popular and its community grow is in a big part that it was presented as a programming language that is easy and simple to use.

In my opinion OCaml is even more easy to use than Python when you only use the basic features that are enough in most cases. We would only have to claim it.

OCaml is used by several companies that use it for its safety in critical applications, so it’s natural that ocaml is presented as a good choice for them, because they make a living with this. For people that would like to see the ocaml community grow, not staying a “niche”, this is maybe not the best choice.

For years documentations about ocaml was also very academic and written by and for Phd’s and scientists. There is nothing wrong with it, this is something just fine, but for those willing to see the community and the ecosystem grow, this is maybe not the best strategy.

Python became popular and successful because there are a lot of tutorials targetting early beginners, and they claim from the beginning that Python is very easy to use.
But if we do that, what will think the customers of the companies that use it for its strong safety, and other similar features? “Are you using a toy for our critical application?”

This is probably not that easy to explain that using an easy/simple language will help you making less errors, and help you focus on the task rather than how to do it.
This is probably not that easy to explain that a language can be a good pick for beginners that just want something easy to use, and at the same time for very high level programers working on critical applications, and high level problems.

mro · July 26, 2022, 4:14pm

this is a big one, indeed. Let me start: OCaml is a very friendly choice for your programming tasks. OCaml won’t let you down.

Paul_A_Steckler · July 27, 2022, 10:55pm

As I like to say: Java, nein.

Jon_Harrop · July 31, 2022, 5:41pm

Re: OCaml for numerics and data science

I’ve been using OCaml and OCaml-like languages for this for decades and there are various ways it is advantageous:

Typing isn’t just about data but also functions and, in particular, combinators. Optimisation functions are an incredibly useful example here. I want to write let x, y = minimize f (x0, y0) to minimize f wrt its two arguments.
Interactive functions like plot can use type information. Look at the plot atan2 example here where the static type of the argument determines the kind of chart used (two real arguments ⇒ 2D heatmap). Would be great if OCaml could do this.
Interactive development and execution is crucially important.
IO is really important. I ran deep leaning on 50GiB of technical news recently and the best tool I found to download the data was OCaml.
Fast serialization can be important and Marshal does the job 10-100x faster than alternatives I have tried.

Topic		Replies	Views
Custom syntax sets Community	19	2758	September 2, 2017
OCaml's domains Community	27	3042	April 20, 2018
Application-specific Improvements to the Ecosystem Community	52	3278	August 12, 2022
OCaml for Data Science Ecosystem machine-learning , data-science , statistics	25	13097	May 3, 2018
My Thoughts on OCaml vs Haskell/Rust in 2023 Ecosystem blog	91	26797	September 13, 2023

OCaml non orthogonal featrures (was Simplification of OCaml as a design goal?)

Related topics