It is true that OCaml development is more coordinated than planned: there is no single actor with enough resources to explore every prospective change on its own. However, there are regular meetings to try to coordinate all the core compiler developer but communicating the result of those meetings is not that straightforward.
F# 5 recenlty was announced with a few interesting new features and migration to .NET 5.0. Posting here just for inspiration and discussion.
My 2¢ on the stdlib discussion, as someone new to (and enamored with) OCaml.
With Go, I like to target the stdlib because it’s very unlikely to have backwards-incompatible changes. I don’t care about downloading 3rd party packages; I care about being unable to upgrade them because someone decided to rewrite the entire API to be “cleaner”. As an added bonus, I find the stdlib to be high quality and most packages are easy to use.
As far as I’ve seen, OCaml’s stdlib doesn’t make breaking changes very often, or at all. However it’s quite small. That’s OK, OCaml does not have to be like Go.
One thing that frustrates me is the concurrency story. I find concurrency based on futures, promises or whatever you want to be viral in nature; once one call is asynchronous, everything has to become asynchronous. I want to build and share packages that are written as if concurrency didn’t exist, then allow others to glue them together with the concurrency solution of their choice in their executables. But as soon as I want to do something like allow for a timeout for some operation, or provide some streaming API, I have to make a decision about whether to use Lwt, Async, or something else. Or build an abstraction for both. I see many packages like angstrom and cohttp have additional “-lwt” and “-async” packages and I wish they weren’t necessary.
Then why not just use threads? OCaml’s threads are more efficient than Lwt/Async, and there is a full-featured set of the normal primitives for locking/monitors, etc. I built a reasonably-complicated blockchain system with them in a straightforward manner. And instantly you get compatibility with all the C/C++ libraries that also assume pthreads.
The only downside is that if you were planning on supporting tens of thousands of concurrent clients with your networked server, sure/sure/sure, you cant’ use threads.
But otherwise, threads work plenty fine.
ETA: Oh, and anybody who’s programmed with pthreads, Java threads, Python threads, etc, already knows how to program with OCaml threads: no fancy monadic I/O abstractions to learn, no tedious syntactic rules to ensure in your code, etc, etc. Just the old rules of how to write thread-safe code.
What do you mean by this? Ocaml threads correspond to OS threads which, in general, do not scale as well as Lwt/Async threads based on most benchmarks I’ve seen. Given Ocaml has a GIL, they are a fair amount of overhead.
I believe there are a lot of people that would disagree with this statement, in the abstract.
(1) I noted “tens of thousands of clients” etc. If you’re writing a program that’s supposed to be used (like most web-apps) by short-lived concurrent client requests in typical transaction-processing fashion, there’s no good reason to use monadic concurrency: pthreads is fine.
(2) monadic I/O is better when you have threads that don’t do I/O, and context-switching overheads dominate actual work. But when your threads are doing actual work, no.
there are a lot of people that would disagree with this statement, in the abstract.
In theory, there is no difference between theory and practice. In practice, it always depends on what you’re doing. I went back and looked at this thread, ( Lwt vs System threads - #14 by ivg ) and the only benchmark mentioned is “Chinese whispers”. I mean … what bullshit: there’s only one real-world case that corresponds, and that’s writing a pub/sub messaging engine. Yeah, sure: if I’m writing one of those, I’m going to use an event-loop-based design. But for a web-server? Of course not. A web-application server with significant traffic will always have a front-end “real web server” to serve static content and absorb network security attacks. So the web-app-server should be receiving only full requests, with all buffering happening in the front-end web server.
The relevant benchmark is not “chinese whispers” or any other concurrency benchmark The relevant benchmark is the overhead in reading characters from a file, one-by-one, thru the monadic I/O stack, vs using direct-style. Because that’s what your web-server is going to be doing.
I want to re-emphasize: YES there are situations where you need monadic I/O. Sure. But for almost all code, that’s not the case. The real argument for using monadic I/O is that you can see where your code gets time-sliced, so the chance of surreptitious time-slicing-induced bugs is lower. But then if you go and run multiple threads to execute your promises, you just threw that out the window.
I know what you mean, but I don’t think it is quite as bad as those words sound. The Lwt scheduler runs its event loop in a single thread and ocaml has a global lock for worker threads, so the only operations which require asynchronous calls are those which will not put the CPU to work, such as waiting for input. Where you are putting the CPU to work, you can yield from time to time to let i/o take place. And when you have finished your i/o for the moment (if your application can be finished with it), you can fulfil the promise on which the event loop is running and exit monad-land, and retreat to the sunny uplands of synchrony.
Ocaml 5.00 is likely to solve some of the issues you are mentioning.
- If you don’t want massive concurrency, threads in Ocaml 5.00 will be truly parallel. So you can use them to do work that is both I/O bound or compute bound. In the Ocaml 4.x we depend on threads being I/O bound so they can give up their OCaml runtime lock when they are blocked in a system call. Alternatively if they are executing pure C-code then they can give up their runtime lock also and become truly parallel in OCaml 4.x. However in Ocaml 5.00 all these complications will be gone. You can have OCaml code that will run parallely on multiple cores. I totally agree with @Chet_Murthy advice that you can avoid polluting your code with Lwt/Async in many cases and not worry about it even today in OCaml 4.x – just use threads!
- Effect handlers in Ocaml 5.00 could solve a lot of your complaints. Any effect can be interpreted by an effect handler in an arbitrary fashion. So you could “bring” your own scheduler. Essentially you specify what you want your program to do, and the effect handlers can decide how exactly they want to execute it – using io_uring, threads, something else… Though in practice we’re always concerned about the implementation details for performance/data sharing reasons. But effect handlers will allow a bit more decoupling to happen between Ocaml code and how it is executed without being mired in the hyper detail that it is today. One might argue that functors allow you to plugin backends too (many packages have async and lwt support). I think effect handlers will be truly less heavy weight and more elegant than functors. At least that is how I understand it. There are some neat examples here: GitHub - ocaml-multicore/effects-examples: Examples to illustrate the use of algebraic effects in Multicore OCaml
P.S. I forgot the biggest advantage of all – avoiding Lwt/Async monadic style concurrency. Everything can be written as “normal” and clean ocaml code but in the reality behave exactly how you want (from a concurrency perspective).
Anyone interested in your second point should definitely have a look at the eio project: GitHub - ocaml-multicore/eio: Effects-based direct-style IO for multicore OCaml
It’s still a work in progress but should give you an (exciting) idea as to what effects-based concurrency could look like and how well it could perform.
Hmm … are you talking about systhreads? If so, I think systhreads is still a concurrency library rather than parallel library. As I understand it, it this is so that OCaml 5.0 maintains backward compatibility for programs using systhreads. Is this not true?
You may be correct regarding systhreads. To be exactly correct, what I should say:
“If you don’t want massive concurrency, domains in OCaml 5.00 will be truly parallel”
Would you agree with this statement?
The unit of parallelism in Multicore OCaml are domains; each systhread will belong to a given domain. Within a single domain systhreads will continue to be cooperatively scheduled (only one systhread will run at a given time), mostly for backwards compatibility reasons (when there is only one domain). However, systhreads belonging to different domains will truly execute in parallel.
Of course, in Multicore OCaml you may not need or want to work with systhreads at all, and use the domain API instead (see GitHub - ocaml-multicore/parallel-programming-in-multicore-ocaml: Tutorial on Multicore OCaml parallel programming with domainslib).
Cheres,
Nicolas
Ah, so the systhreads semantics hasn’t changed. Thanks for confirming. Yes, I agree with your statement.
Unless they belong to different domains as described by nojb above…
Just read the nov update.
What does everyone mean when they refer to “semantics?”
In general, semantic is the study of the meaning of a language (be it a programming one, or a vernacular one like english, chinese, spanish…). But in the case of programming language, we distinguish denotational and operational semantics. Briefly, the former study the value represented by a expression (what do I compute) and the latter the way the value is computed at runtime (how do I compute it).
For instance these two values:
let i = 21
let j = (fun x -> x * 7) 3
denote the same value (namely the int 21
) but operationally they differ since the latter need some computation at runtime.
In the case of systhreads, we are interested in its operational semantics (how computations are interleaved at runtime) since the denotation of an expression may depend on it when, for instance, a mutable value is accessed (read/write) concurrently.
@kantian is right, but more informally, in a programming language context, I think of “semantics” as what the language elements do, how they work, etc. For example, the semantics of +
are different in OCaml and Haskell (for example). That’s a semantic difference. In OCaml +
only operates on integers, whereas in Haskell, it operates on other sorts of numbers as well. However, given two integers within a certain restricted range, +
does the same thing–produces the same output–in both languages. So that’s a semantic similarity. This can all be spelled out more precisely in the terms that kantian introduced–and for some purposes, that kind of precision is absolutely essential–but the general idea can be understood without it.
In the context that I have used, ‘semantics’ = ‘behaviour’, i.e. the ‘semantics/behaviour’ of systhreads has remained the same in OCaml 5.0 as it was in OCaml 4.0. For background of the systhreads, in OCaml 4.0. systhreads is more of a concurrency library rather than a parallel programming library, i.e. it is designed to be run on a single core rather than on multiple cores. So Ocaml 5.0 maintains this semantics/behaviour when run in one core but according to the information above, now we can run systhreads in more than one core at the same time.