Applied numerical algebra, and type systems

Chet_Murthy · August 4, 2022, 10:54pm

A couple of threads ago, during a discussion about why OCaml was losing-out to Python, someone remarked that if OCaml would be used for applied numerical algebra, it would need a type system that would allow dimensionality-checking. And I have a friend who argues the same thing, so the view isn’t completely rare.

I thought I’d push back against that view, and also against the view that languages like OCaml aren’t useful for that sort of work.

If you look at most of the languages people use for matrix-crunching, they don’t have dimension-checking. Most of the time, what checking they have, is at runtime. So OCaml lacking that, wouldn’t be a problem, or at least, it wouldn’t be a disadvantage.

The real blocker here is getting the OCaml code you’d write, to be sufficiently succinct and transparent. And still access all those low-level libraries (BLAS, LAPACK, ATLAS, etc).

One of the most widely-used packages, numpy/scipy, doesn’t have any static typechecking at all.

Moving on to OCaml’s suitability, I would encourage you all to look at pyscf: a computational chemistry package built on numpy/scipy. It’s got all sorts of complex datastructures that are not matrices, to represent molecules, orbitals, all that jazz. Stuff I haven’t a clue about (what is “second quantization” ? damfino!). And those sorts of datastructures should be mothers’ milk for OCaml.

thomega · August 5, 2022, 5:58am

“First quantization is a mystery, but second quantization is a functor”–Edward Nelson

Seriously, “second quantization” refers to a particular choice of Hilbert space that allows to add and remove particles (electrons for the chemists) and keep track of the Pauli principle.

Chet_Murthy · August 5, 2022, 6:11am

grin someday when I get thru learning enough about quantum computing to be useful, I’ll start learning about quantum chemistry …

thomega · August 5, 2022, 6:13am

Regarding dimensionality checking: while it is true that most dimension checking in Fortran etc. is performed at runtime, it would be helpful to have an even richer structure. Matrices are often tensors, i.e. there are rules for which indices can be connected. They are various notations for type-checking pencil-and-paper calculations and it would be useful to have this in a numerical language. F# has a nice unit-checking, that could be generalized.

Chet_Murthy · August 5, 2022, 6:16am

Sure, and we agree. My point is that the lack of this, isn’t what prevents adoption for numerical computing. B/c the big category-killer in that space doesn’t have any such richer typing.

bluddy · August 5, 2022, 11:39am

Yes, precisely. But it doesn’t have much to offer in this space. That was my point. How will you get people to switch? Now it’s true that the rest of the system benefits from type checking, but your main data stream cannot be type-checked effectively.

Right. Why bother moving to OCaml when the equivalent operations are far more complex to write? Numpy is incredibly succinct with its slicing operators, and they’re mostly intuitive. Basically, there has to be a strong incentive to move an ecosystem to OCaml, and I just don’t really see it yet. Owl is a nice try and I think Owl_base fills an important niche in the ecosystem, but it’s very far from being intuitive.

Sorry, I don’t know anything about that. I work in deep learning.

From what I understand, people are mostly unhappy with F#'s unit checking. Can’t tell you why exactly, but that’s the impression I have.

thomega · August 5, 2022, 12:07pm

That’s interesting. Is it considered to be too intrusive/tedious or do people see no benefits?. I haven’t used F# apart from playing in the REPL, but I’m in the middle of grading physics exams and a lot of students would have benefitted from some unit checking

bpr · August 5, 2022, 3:53pm

I like OCaml a lot, and I have some background in scientific computing and applied numerical linear algebra. I don’t see much use for OCaml in this domain. As noted, Python, or really, a Python veneer over C++ and Fortran, is dominant. The main competitor I see, Julia, is also far behind the dominant player, but has also been in that domain for years now, and it is focused on that domain. Some basic affordances that OCaml currently lacks include the inability to control layout (saw Stephen Dolan’s talk, don’t expect to see that in OCaml in < 10 years) and overloading. Julia has both (multimethods are more than overloading), and so much more, and yet it still has barely had impact on Python/C++ dominance.

Jeremy Howard of FastAI gave the keynote at JuliaCon this year, and his perspective, as someone who likes Julia much more than Python, but still uses Python, was informative. He divided languages into two classes, those with significant runtimes (Python, Go, Java) and those without (C, C++, Rust), and suggested that Julia would do well to learn from Go, especially it’s fast compilation and ability to deliver stand alone executables. If you know Julia, you know those are currently weaknesses. I think a similar analysis for OCaml looks better for it’s future, not because I think OCaml has good chances in numerics, but rather because OCaml 5 looks competitive with Go itself (memory model, concurrency, effect system) and it’s weaknesses vis-a-vis Go (syntax, incidental complexity, poor docs, etc.) not too difficult to overcome when building distributed systems, which is Go’s wheelhouse.

c-cube · August 5, 2022, 4:37pm

I don’t want to divert too much into the other thread (“what I dislike
about OCaml”), but even OCaml 5 has very little chance of competing against
Go. A few things people who like Go will absolutely hate about OCaml
(5), and that are not close to being addressed, are listed in the other
thread:

an extensive stdlib with good support for networking: they have http
client and server, cryptography, all sorts of formats are supported,
base64, json, etc. We have literally none of these in the stdlib.
one tool (“go”) that does almost everything, including (if I
understand correctly) some form of package management. We… don’t have
that.
my own personal critique: Go has interfaces which are just a weak-sauce
version of OCaml’s objects; but it uses them well in its stdlib, which is also
why it’s so widely used. We don’t have Reader or Writer or anything like that,
so composability of OCaml’s standard IO is abysmal. Eio is addressing
that by picking objects for its channels, but it’s not standard.

Maybe if Eio gets “blessed” as semi-standard, it’ll be a good solution to
that particular pain point.

As for competing with python in numerics: imho, no chance. OCaml’s notations
are very rigid (which is often nice), there’s no slicing except with the
crazy looking .%{;…} (which allocates an array), and generally method
notation doesn’t work (unless you use objects). All of that means people
used to python will recoil in horror.

Chet_Murthy · August 5, 2022, 7:10pm

I can’t disagree with your observations, at least, not much. So the “not much” would be:

I think numpy/scipy make it easy to start numerical computing. Really easy to start. But the fact that it’s Python makes it harder to continue, as your problems get big. So for instance, in the work I’ve been doing, I found:

{sparse matrix}*{dense vector} matrix multiply was slow, b/c serial, and a bottleneck

I replaced scipy’s M @ x with Rust sparse-matrix code that used rayon parallelization
ditto “a * x + y”, “a * x” (x, y dense vector)

Same as above, but then I found a package numexpr that does the same (and a little better) [more on this below]
yesterday, in an eigenvector extraction code, there was a “preconditioner” that needed to take a vector and “regularize” it by replacing values too close to zero, with a chosen nonzero value, so that you could then take a quotient with another vector. I’ll write that out:

a. let f x = if abs(x)) < tolerance then tolerance else x
b. apply f to a vector v – call this “vectorize f v”
c. then compute dx / (vectorize f v)

these sorts of oh-so-slightly-complex expressions have no support in numpy/scipy.

So a cottage industry of hacks has sprung up, trying to work around various problems with Python. And sure, that’s great. But boy howdy it’s a PITA having to cobble stuff together. And knowing that I could do this all in one language (and with OCaml 5.0, maybe even get SMP exploitation) is … frustrating (since in this domain, ain’t nobody gonna switch).

Chet_Murthy · August 5, 2022, 7:18pm

Golang … jesus. There’s a reason Golang is the way it is: Google invested a lot of money in it, and in making it fit for their purposes. Oh and BTW, what you get isn’t fit for general-purpose computing, for general-purpose distributed computing, b/c their purpose was “to allow people to write clients/servers that use Google services”.

[Everything I write below about 5yr old, but I’d be (pleasantly) surprised if any of it changed: b/c I had an interaction with a Google product-manager back then, who tried to convince me that I should like all this sludge and should eat it up with a spoon]

This shows up in all sorts of ways, like:

you can call socketpair(), but you can’t convert the integer into a filehandle that you can hand to the networking stack. This means you can’t run GRPC over that socketpair. That is to say, a time-honored way to run a client/server system:

a. parent creates socketpair
b. parent forks child
c. child dup2()'s socket to well-known number
d. child execs to another program, which then can use that socket for IPC with parent

is unavailable if you want to do GRPC for IPC. Gosh, why would they do that? Because they DGAF that you might want too do this: they only care about you using GRPC to talk to TCP servers (like Google’s).

GRPC doesn’t allow you to do your own accept-loop and just hand connections to GRPC for service. And possibly use your own thread-pool. Gosh, wouldn’t want that.
GRPC is generally speaking monothic: you can’t substitute your own parts. [And note well that that isn’t necessary: Thrift both in Golang and in other languages, is not monolithic, and that’s useful]
And the Golang networking stack is similar in this regard: it’s a unified thing, and you can’t pull out parts and use them easily (or sometimes, at all).
Don’t get me started on Golang interfaces: I mean jesus: an interface with two different structs that implement it, has THREE DIFFERENT NULL VALUES, all incomparable. Just … madness.

It’s a congeries of special-cases, with spackling spread over it to give the appearance of a unified design.

Chet_Murthy · August 5, 2022, 7:28pm

And something else: We can KNOW that they don’t actually use this networking stack inside google. How do we know this? Because you can read the Chubby paper, and then look into the GRPC corpus and see that there’s no spot to wedge in the “stubby” nameserver hooks in GRPC as shipped. So they do something different internally.

This is all sludge shipped out-the-door for us plebes.

bluddy · August 6, 2022, 9:32pm

I don’t know if a small allocation matters when comparing to python. It’s doing tiny allocations all over the place, I think, as part of interpretation. The big object is really what we care about. Nevertheless, I agree that this syntax is nuts.

I really wish we had good-performing OOP. It could have filled in so many of the holes we currently have.

BTW there is a big advantage in having proper multithreading: we’re developing a product where the algorithmic version is in python, and sharing data across processes is a pain.

Chet_Murthy · August 6, 2022, 9:57pm

I know I keep banging on this drum, but have you looked at Rust? It doesn’t have OOP; instead, it has an OOP-like syntax for typeclasses. Look at the sprs sparse-matrix library for instance.

c-cube · August 7, 2022, 3:50am

I would say that dynamic trait objects in rust are objects, even if they’re not much OOP.

Compared to OCaml, they still give you a dot notation with efficient dynamic dispatch. We don’t really have dot notation besides a#b. It means you need to know what module to look in, rather than just knowing the name of the method; or use first-class modules. Overall I think first-class modules are the modern OCaml equivalent, but they’re less ergonomic because you need to unpack them to use them.

Chet_Murthy · August 7, 2022, 4:01am

dynamic traits are OO-like. But almost all usage of traits isn’t dynamic.

c-cube · August 7, 2022, 4:14am

I think it depends heavily on the trait. I’d expect many uses of
io::Write/BufWrite/Read/BufRead to be dynamic, for example! And I’d love
to have the same in OCaml… but never put the effort into finishing the
PR to make channels extensible.

mro · August 7, 2022, 9:53am

I left Go for OCaml and find my reasons affirmed every month.

My issues with OCaml I can work around or improve (me and OCaml). Mine with Go I cannot, they are central.

bpr · August 7, 2022, 3:16pm

I agree with everything you wrote here. That said, in the same way that not everyone using OCaml likes it (referring to that other thread), it is the same with Go. I like Go, I just like OCaml much more.

The “extensive stdlib” problem has bedeviled OCaml for a long time. I had hoped (it’s been a long time since I used OCaml for work) that industrial users would fill this gap, and to some extent Jane Street has, but yeah, Go and Python are really batteries included compared to OCaml. Lots of really basic things are missing, or have to be cobbled together from disparate sources. Resizable arrays? GRPC? Yeah, I can find the pieces, but not having them all wrapped up together gives me pause.

I especially agree that numerics in OCaml is a lost cause. I’d love to be proven wrong! I did spend some time with Julia, and that language and community are so far ahead in this that if I were trying to ditch Python for another language, that is likely where I’d turn. Julia’s reason for being is to not have to drop down to C++ or Fortran. That’s pretty cool!

Still, I think OCaml5 has a chance to leapfrog Go and Java in the concurrency field. Success does not have to mean displacing them, even a small percentage Go/Java work going to OCaml would be huge.

BTW, Rust, and Rust dot notation, is almost entirely static dispatch. As @Chet_Murthy points out, it’s only dyn traits that are dynamically dispatched. Certainly, if OCaml is going to be the veneer over some better performing systems language, I’d prefer that the systems language be Rust, rather than C or C++.

Topic		Replies	Views
OCaml for Data Science Ecosystem machine-learning , data-science , statistics	25	12998	May 3, 2018
OCaml and OCaml libraries for data-oriented applications and data analysis Community machine-learning , data-science , statistics	6	1760	December 7, 2020
My Thoughts on OCaml vs Haskell/Rust in 2023 Ecosystem blog	91	26241	September 13, 2023
Application-specific Improvements to the Ecosystem Community	52	2946	August 12, 2022
OCaml's domains Community	27	2973	April 20, 2018

Applied numerical algebra, and type systems

Related topics