- Corrected the post.
- Regarding the Working with a huge data chunks - it is postponed for a while, I focused my efforts on porting my codebase and dependencies to 4.08, to improve exceptions experience.
- OK, will try to keep myself away from “pings”.
At my current day job, the favorite tool for this is Ach. It’s sufficiently cool that I’ve half a mind to whip up an OCaml interface to it. Alas, it is a hot summer here in San Francisco, and I have too many wonderful things to do outside the house in my copious spare time to take on another hobby project. Maybe someone else will pick up this idea and run with it.
you have to look at what Facebook uses OCaml for first.
Here are some publicly know projects that Facebook uses OCaml for:
-
Hack is written in OCaml. Their entire backend is built upon this so I don’t think multicore have much values to them in this front. There is rehp which is essentially a OCaml -> Hack compiler so they can reuse code (like validations) on both backend and frontend.
-
Flow. again, the compiler is written in OCaml but I think with the declining of Flow user bases, there won’t be any investments from Facebook for Flow.
-
Pfff is a code analysis project
As you can see, Facebook doesn’t have any things that really need the power of multi core.
Code analysis (Pfff and Infer) tools exactly need multicore, to work properly on large codebases.
Is this project still alive?
Hack is using workarounds because ocaml doesn’t have embedded multicore support. See:
- https://www.youtube.com/watch?v=uXuYVUdFY48&t=0s&list=WL&index=28
- hhvm/hphp/hack/src/heap/hh_shared.c at master · facebook/hhvm · GitHub
- A parallel and shared memory library based on Hack's implementation
There was an article about this shared memory system too, but I can’t find it.
To be honest, BAP woks like this not because OCaml lacks multicore. Imagine, if we had multicore OCaml (or BAP will be written in Haskell or F#), and the core of BAP will be using all those threads and stuff like that. And, you will be still using one core to analyse 100Mb file. So that would be really depressing.
The problem with program analysis is that it is hard if not impossible to parallelize. The good news, is that there are still some options (which as always come with a price). In BAP 2.0 we have a fresh new incremental disassembler, so that you now can run disassembly on several pieces of code in parallel and then join the result (and depending on the connectedness of the control flow graph this could make things both faster or slower than the single thread implementation).
And yes, we’re planning eventually, some time after the BAP 2.0 release to parallelize it. But under no circumstances we even consider to use shared memory parallelism for that. For many reasons (including all the reasons @Chet_Murthy has mentioned, in addition to that the heap is already the most scarce resource we have, so we actually need parallelism to split our heaps).
To summarize, OCaml multicore is definitely not a road-blocker for BAP’s parallelism, nor it is a dependency.
To me personally, the main output of the OCaml multicore is not the multicore itself, but rather the effect system, which is an important development in the type theory and static analysis. And I wish those projects could actually split, because one is unnecessary blocking the other.
TL;DR; the notion of a “function” itself is undecidable, so yes, if we would have an oracle which will say these are the functions, we could run our analysis in parallel, but in real life there are no functions given to us.
And here comes the main difference with the regular program analysis. In Binary Analysis (or Reverse Engineering in general) we don’t have the ground truth of the control flow graph in general, and we have to start with the initial zero knowledge and apply our analysis function to it until a fixed point is reached. Given that the analysis is inherently an undecidable function, there is no fixed point, thus any graph that we will produce would be an under-approximation. For example, an analysis of some initial approximation of a subroutine, may yield the result which will change the set of edges in the graph, which will end up in repartitioning the graph into a different quotient set of subroutines and so on. If we will apply classical map/reduce here, we may end up in a situation where each worker will invalidate the results of all other workers, so instead of getting 1/n improvement we will be getting Γ(n,1) slowdown. (or O(n!) in big-O notation), in the naive implementation.
Therefore it is a big research question, how to parallelize a fixed point computation. But this is not to say, that BAP couldn’t benefit from running some computations in parallel. Fortunately, most of the programs (especially well-behaving programs) are linear in their nature and do not exhibit such level of mutual recursiveness which will render parallel reasoning useless. But we definitely do not need multicore support to implement this.
To answer the specific question “Why doesn’t Facebook sponsor the Multicore project?”: to an outsider like me, it appears that money-related decision-making within Facebook is complex and unpredictable. Many great people within Facebook are using OCaml and/or Reason, and they would of course like to support the external projects that are important to the OCaml and Reason communities, but it looks like they are not able to find the management level at which to make these things happen – for example, Facebook was not in the list of sponsors of this year’s ReasonConf. This is a bit strange, but companies that large companies are always strange in many ways; I guess we just have to live with it.
(It also makes it easier to realize how lucky we have within the OCaml community to have companies willing to fund projects of interest to the wider community that are extremely approachable.)
[avsm@ is right, very, very right, that we should try to stay on-topic. mea culpa, mea maxima culpa. So heeere goes]
I’ve spent some time in large companies that have external developer-focused marketing presence. Some data-points:
[second-hand] (1) it is well-known that even though MSFT developed the CLR, major portions of the Windows team pushed-back on it, b/c they did not believe it adequately managed memory. It was explained to me that sure/sure/sure maybe the CLR is fine for many things, but for system services, its memory-profligacy (and probably also cache-unfriendliness) was unacceptable.
[first-hand, saw it myself] (2) we all hear about how Google is so heavily into Golang. Yeah. Right. Sure. Fine. Bye. When I worked there, there were enormous factions of folks who would never use Golang, and while sure, some systems were written in Golang, THE language you had to know wasn’t Golang: it was C++. With Python in second place, and another language I cannot mention in third.
I mean … sheesh. The external marketing was incredible [in the sense of “not credible”] compared to the real experience.
So, to respond to first-para/last-sentence: yes, these big companies are feudal empires, and they contain multitudes. The thing is, the people who come to you from the ReasonML camp in FB, they have NO GOOD REASON to tell you about all this: they want you to think about FB as a place to work, and a vital contributor to Ocaml. So of course they’re gonna pretend that FB is 100% all-in on Ocaml. As I related above, that’s pretty much what the Golang sycophants pretend is happening in Google.
My first manager in IBM (in 1995) once told me something: “you don’t believe IBM’s marketing, b/c you know the inside story; why do you so eagerly accept other companies’ marketing?”
He. Was. Right.
Don’t believe marketing from folks from I/T companies: they have every reason to deceive you.
I would be remiss if I did not address the last para. Jane Street is exemplary, and regardless of what each of us might think about the technical merits of their software, the fact that they’ve been so -stalwart- in supporting Ocaml is worthy of applause. They didn’t just talk about it, and they didn’t just use ocaml for their own needs without trying to help the community.
OK, I’ll stop.
I’d still like to see an OCaml with rust-like lifetime management, could get rid of the GC and handle multicore quite safely in the great majority of circumstances…
Although I still want to see Algebraic Effects come in to OCaml… ^.^;
How would this be much different from… Rust itself?
Apparently Haskell world works towards this goal with their Linear types project, not sure about progress of it though.
Their type systems have overlap, but it’s not complete and OCaml can still safely type things that Rust cannot do as easily or at all.
I’ve been watching that for a few years but it seems dead as far as I’ve seen…
Ooo a new thing to read. ^.^
it seems dead as far as I’ve seen…
Seems not so dead:
linear types
Last edited by Arnaud Spiwack 1 week ago
Also the last comment of the GHC proposal is 6 days ago: https://github.com/ghc-proposals/ghc-proposals/pull/111#issuecomment-510281265
Not quite dead, even by my standards, haha.
Ooo that’s nice to see work on it! I remember following it for a bit a few years ago.
You can already marshal/unmarshal to/from Bigarrays (have a look into Parmap’s code).
So, I don’t think we need bindings to Ach in OCaml.
Chet’s thoughtful comments are a good reminder of the challenges facing multicore. From my interactions with OCaml maintainers and researchers in both Paris and the UK, I can tell you that they are well aware that language support for shared-memory parallelism does not stop at making the GC multicore.
I am worried to see this many off-topic comments pop-up here and in other topics on the same theme “What can OCaml learn from Rust?”. It is not a bad question. Unfortunately, the form chosen mean that they are not attracting quality answers. It might be more suitable for dedicated topics, and better-quality replies might be obtained if more effort is made in researching and asking such questions.
Lastly, I would like to react to the sentiment from the original post that OCaml is late to the multicore party. I prefer to see it as an opportunity to have a cleaner concurrency and parallelism story than other languages that started the race sooner, when fewer solutions were known.
A good blog post about the similar goal to bring multithreading and multicore out of the box in Julia language: https://julialang.org/blog/2019/07/multithreading
Maybe I can mention it for you: Sawzall ?