Application-specific Improvements to the Ecosystem

bluddy · July 26, 2022, 1:54pm

Continuing the discussion from OCaml non orthogonal featrures (was Simplification of OCaml as a design goal?):

I think it’s worthwhile to continue the direction of conversation in the previous thread, and to discuss how the ecosystem can be improved for the purposes of people who program (not in OCaml) in their specific fields.

For example, using OCaml for light scripting requires IMO a very broad API for file and string manipulation, but one that doesn’t bog you down in error handling. This means using exceptions rather than the error monad deliberately.

I’m not sure how much value can be obtained by OCaml in the data science space because ultimately, the main issues have to do with array dimensionality, and that simply isn’t handled by most type systems outside of dependent types. Also, a lot of functionality is buried inside numpy’s array indexing operators and that’s hard to match with OCaml’s explicitness (as can be seen in Owl).

I’d love to hear about more domains where you don’t use OCaml in your daily job, but would like to.

MaxHaydenChiz · July 26, 2022, 2:52pm

Two related questions that will shine light on the first are:

What are the best projects for using OCaml, the ones where it is hands down by and away the best choice?
What are the marginal projects where OCaml would be fine, but where people can (or usually do) go with other languages and tools? And what would it take to make OCaml a clear winner?

In terms of improving the language, those are going to generate more immediate, actionable ideas.

P.S. I think modular implicits would go a long way towards making Owl much more useful. And you can always make really high quality interfaces with R and Python for calling core algorithms and libraries if the surrounding code is better expressed in OCaml. Even if OCaml isn’t going to unseat the reigning languages, I think I lot can be gained from trying to smooth out the rough edges that are visible when you compare Owl to alternative options.

bluddy · July 26, 2022, 4:12pm

I think OCaml is the clear winner in compilers, type checkers etc and that’s what it’s used for the most. This is quite clear. However, as was said in the previous thread, the strengths and deficits of the ecosystem often affect which applications are popular. Often it takes the right libraries to make a language shine. For example, without numpy, python would not have been a good candidate for data science.

Modular implicits are a pipe dream, and are not worth spending time thinking about. We need to figure out ways to improve the ecosystem given what we have.

hyphenrf · July 26, 2022, 5:32pm

NOOO DON’T SAY THAT NOOOoooo…

octachron · July 26, 2022, 6:17pm

Modular implicits is neither a pipe dream nor a feature that you should wait for when building new libraries: it is a research project.

MaxHaydenChiz · July 26, 2022, 6:21pm

In a sense, all of this is a “pipe dream” unless someone cares enough to do it.

The ultimate bottom line is that I don’t have the time or resources to add any of this stuff to the ecosystem. And unless someone else does, it’s not going to get done. So if this wasn’t a request for ideas for a project you could work on, I’m not sure what the point is. And if it was such a request, then modular implicits is a thing I’d like someone to work on.

But since you did ask:

I think OCaml is generally good with complex data structures, implementing some protocol or standard, and general “business logic” branch heavy code. Compilers and language tools are just a special case of that.

This is where the pattern matching and exhaustiveness checks shine. The bulk of bugs in most large programs comes from complex control flow, and synchronized if statements in particular (because they break the correspondence between each line of static code representing one point in the dynamic control flow of the program).

In principle, you can encode that sort of information using templates or inheritance systems and dynamic dispatch, but in practice, the requirements for business logic change so often and are so arbitrary that it’s usually just better to use pattern matching and rely on the compiler to enforce consistency throughout the program.

In any code like that, the only reason I don’t use OCaml is because the startup cost of is too high. I can’t use some kind of lightweight IO stuff like I can in Perl. I don’t have a good database driver. Etc. So in practice, OCaml only gets brought out when it’s a big mission critical thing that I know in advanced is going to get involved. It doesn’t get used for experimental code that has the potential to evolve that way.

IMO, this is where the biggest / easiest improvements are. Reduce friction on the small / hobby project end of things and build out libraries features that make writing that kind of work easier to do.

Now that we have multicore, if OCaml gets libraries for handling irregular parallelism in a sane way (e.g. nested data parallelism, or the newer research on GPU acceleration of business logic tasks), that would make it a lot more compelling. Same goes for if algebraic effects improves the ability to reason about resource usage in a way that improves on the RAII used in C++ and Rust.

Another area where I’d like to use OCaml more would be random IOT gizmos on a Raspberry Pi and general administrative scripts for managing my computer infrastructure. For the former, Python has more day-to-day libraries for any random task and OCaml doesn’t have a good, low friction way to interface with Python code. For the latter, OCaml doesn’t have a good Windows story right now.

The windows thing is hard, but better, python-specific FFI is something someone could do. I don’t know how Julia manages PyCall.jl, but that’s on the level that I’d like for things to be (and it has made doing scientific work in Julia essentially “free” since interfacing to the massive existing libraries is basically free).

If the barrier to formal verification comes down or if Coq gets more accessible, I’ll probably start using OCaml a lot more as a result. But just learning the basics is hard and I’m no where near good enough to use those tools efficiently. I’d really like to be though. And as I said in the other thread, most practical code bases are distributed systems and I’d really like to be able to make the kind of formal guarantees for the entire system that you can get from the type system for an individual process. But I don’t know how to do that easily and there don’t seem to be accessible tooling for that purpose.

Similarly, I’d like to know more about how to do soft real-time work in OCaml, but I don’t understand enough about how to tune the GC and the allocations. So for anything where I have memory and timing constraints, I default to C/C++/Rust. But this could just a documentation issue. People do use OCaml for this, I just don’t know how I’d learn to do it in the context of a hobby project.

As for where it doesn’t get used:

If I have a standardized problem with a standardized solution and I’m just plugging lego blocks together. Odds are the thing more people are using has more eyeballs and is going to work just fine.

If I’m doing mathematical or statistical work (which at this point is most of what I do), I want the code I type to be as close to my actual math as possible. Any mismatch is where my bugs are going to come from. The way you can use unicode in Julia is really powerful and has pulled me into using that language for certain types of problems instead of just doing it Matlab for example.

Similarly, if I’m doing visualizations or some other presentation, OCaml doesn’t have the library and the tooling. I mostly hate every library in existence for this though. Everything has problems and limitations. And if someone could make a good OCaml one, I’d hop on board immediately.

P.S. I wouldn’t say “never” on modular implicits. People used to say that about multicore. And we eventually got multicore. I’m not holding my breath or building a project around modular implicits being there in the short term. But I am going to point out situations where it would help. (Like with Owl).

I do think being able to specify memory layout stuff is going to come sooner and will have a more immediate impact on more code. IIRC, from a YT presentation I saw, Jane Street has an internal prototype as-is.

MaxHaydenChiz · July 26, 2022, 7:05pm

Reflecting on this conversation, the tl;dr is that the single biggest thing someone could do would be to make interfacing with Python seamless in both directions.

I should never be in a position where I say, “this seems like a great job for OCaml, but Python has such good libraries that it isn’t going to be worth spending the time using FFI to access them” or, “my clients mostly want to use my library from Python and R so I’ll have to write my code in C++ to make that feasible.”

If it’s basically costless to write my piece of the code in OCaml instead of in Python or C++ that is going to be compiled to interface with Python, then I’ll write more of my code in OCaml, especially now that it has multicore and can be used to speed up stuff without having to deal with the complexities of C++ to do it.

But it basically needs to be automatic to the point of being automagical.

dbuenzli · July 26, 2022, 8:16pm

I always found the notion of script vs program to be a dubious one. Things that start like scripts without error handling end up being brittle and user hostile programs that run critical infrastructure.

Always write scripts with the attention you would give to a program and that includes error handling. Otherwise said don’t write scripts, write programs.

For that I don’t think the error monad with let* bindings bogs you down in error handling. Quite the contrary. Go fast, don’t care too much about error handling, just let* your calls. By doing nothing you already have a solution vastly superior to a shell script as far as error handling goes. When errors pop up with a lack of proper context gradually insert Result.map_error at the right places to improve the error messages.

I completely stopped writing shell scripts in favour of OCaml. Maybe it takes me 5 minutes instead of 2 but the long term benefits for maintenance and program evolution is largely worth it.

The only real burden is that it’s not there by default on machines like shells are. So I end up installing opam on the machine with a dedicated machine switch and then have these runes:

#!/usr/bin/env opam exec --switch machine -- ocaml
#use "topfind"
#require "mywonderfullibrary"

beajeanm · July 26, 2022, 9:49pm

If you are looking for popularity, this is probably the wrong question. I mean what’s the best language for write a CRUD webapp? I can ask 10 developers and I’ll probably get 10 different responses.
To be a popular language you need to be a solid choice in different categories.
If someone decide that Java/.Net/Rust/Python (we can probably list several others) is their main language, they know that whatever the technical problem du jour is, they probably have a solid framework to solve it.

Would you recommend OCaml for:

A webapp? Maybe, It depends on your opinion of Ocsigen the only project that seems mature enough.
A financial system? Despite the work of JS, you’ll have to be ready to re-invent the wheel since many building blocks are not available publicly.
Developing a cloud based app? No major cloud provider has an OCaml SDK, and the open source one the community has developed is fairly limited.

Even our good libraries, are probably not what the wider developer community is expecting to find in a “modern” ecosystem
E.g. I love caqti but a developer comming from a different community would probably expect something closer to Diesel where all your basic queries are generated for free by the framework. And I’m picking Diesel as rust sit in the same strongly typed corner, but you can find equivalent in most mainstream languages.

But, as you mentioned, fixing any of that is a lot of thankless effort from whoever care enough about it.

hyphenrf · July 26, 2022, 10:22pm

#!/usr/bin/env opam exec --switch machine -- ocaml

I thought you weren’t allowed to have this many things in a shebang

roddy · July 26, 2022, 10:36pm

For web apps, Dream is definitely mature enough to use instead of e.g. Flask in Python or Express in JS.

For SQL, I’m a bit biased but I think ppx_rapper is nicer than ORMs in many cases, but nevertheless there are many cases where you would like a light ORM to do basic CRUD and I’d love it if someone ported Diesel or something similar to OCaml.

dbuenzli · July 27, 2022, 8:23am

2 posts were split to a new topic: Running OCaml scripts

c-cube · July 26, 2022, 11:20pm

Not saying let* is bad, or error handling overrated ; but you can’t just replace let with let* and hope it works. It contaminates everything, including functions (which is good when you want signatures to reflect errors!) and it doesn’t interface well with lists and other containers. The ideal solution would be typed effects/typed exceptions that would compose well with ocaml’s control flow constructs (like loops and higher order function calls).

dbuenzli · July 26, 2022, 11:36pm

What you see as contamination, I see as honesty^[1]. Lists can be handled with suitable combinators and I’m not sure your ideal solution changes much in practice. But more importantly even rust has a special syntax for it so it can’t be wrong :–)

Which is quite different from contamination from a concurrency monad where I need to suffer the contamination even if concurrency is not a concern for my code. ↩︎

jumpnbrownweasel · July 27, 2022, 2:04am

By “memory layout stuff” do you mean the unboxed types proposal (https://github.com/ocaml/RFCs/pull/10), or something else?

jumpnbrownweasel · July 27, 2022, 2:10am

I very much agree. Even though monads work well and are fairly explicit, they are not beginner friendly and the possibility of doing without them in a functional language will be a differentiator for OCaml. I really believe the multicore and event system changes will attract a lot of new interest, which will help with other things because more people will be involved to help.

jumpnbrownweasel · July 27, 2022, 2:22am

A little more: Over the years I’ve looked at OCaml from time to time, but each time I found that there was still no multicore support which ruled it out for me, rightly or wrongly, and each time I had less confidence that it would ever happen. Now that it is really coming and seems done so nicely, I’m looking at OCaml from a fresh viewpoint. I can’t be the only one.

Gopiandcode · July 27, 2022, 3:16am

The main problem I have with this approach is that when an error does pop up, you don’t have much context - especially if you’re interfacing between several libraries. In contrast, at least with exceptions you can re-run with OCAMLRUNPARAM="b" and get a proper stacktrace.

dangdennis · July 27, 2022, 3:37am

ppx_wrapper looks neat. And I like how the ocaml site pretties up the github doc ppx_rapper 3.1.0 · OCaml Package.

@tmattio how would you say your experience is with dream now that the v3 site is out?

Chet_Murthy · July 27, 2022, 3:47am

Funny thing: in Rust, they effectively have let* for the error monad, and it works fine. And yeah, it contaminates lots of places, and (with @dbuenzli ) it’s “honesty”. I don’t see any problems with it in Rust. And I’m a guy who was and remains energetically anti-monad.

Topic		Replies	Views
OCaml for Data Science Ecosystem machine-learning , data-science , statistics	25	12907	May 3, 2018
My Thoughts on OCaml vs Haskell/Rust in 2023 Ecosystem blog	91	25535	September 13, 2023
OCaml - first impressions Learning	26	2292	September 20, 2020
OCaml non orthogonal featrures (was Simplification of OCaml as a design goal?) Community	48	3145	July 31, 2022
Applied numerical algebra, and type systems Ecosystem	18	1100	August 7, 2022

Application-specific Improvements to the Ecosystem

Related topics