Why Facebook does not sponsor Multicore project?

Not all the problems could be solved with processes.

Take multimedia pipelines as an example: I have a process which takes buffers from multiple UDP sources, parses the mpeg/whatever packages stored within UDP, decodes video, merges frames from various streams, adds effects etc.

Each step of this pipeline could be done in parallel threads without any problems. And avoiding deadlocks is quite trivial due to very straightforward architecture: stages simply pass the objects between each other one way or another. That’s how gstreamer/directshow works.

Split this pipeline in processes and you are either doomed to have a mmap hell, or your performance would be dismal due to copying.

1 Like

There are 2 separate advantages of multicore from my perspective:

  1. Using processes for parallelism is a pain. Processes are awkward; using them involves calls to the OS which differ between operating systems; it involves heavy usage of system calls for communication between processes(for example, fork is cheap on Unix but isn’t on Windows, pipes and sockets are relatively expensive); it’s not type safe (or particularly secure), as everything needs to be serialized between processes; and keeping track of process liveness is also painful.

    On the flip side, transitioning from a process-based model to a distributed one is less difficult, assuming we handwave some details away.

    Multicore gives you a way to manage all of that from one OCaml application, and that’s huge. It makes OCaml a far better candidate to be used in low-level applications where it currently has a tough time competing, unless you’re a huge company with the resources and dedicated manpower to set up process-based parallelism just the right way.

  2. While sharding works in many instances, there are applications where you want to share a lot of memory between cores. Doing this via serialization between processes via pipes is not practical, and process-based memory sharing is an even worse experience on OCaml. This is also where the danger of multicore comes in-- until you share memory, there’s no problem. But using a shared memory area on OCaml is very limited due to the tracing GC (unlike, say, python), unless (once again) you’re a huge company like Jane Street or Facebook and therefore have the ability to configure your shared memory area to store well-defined structured data for a specific purpose using a ctypes-like FFI.

    Here, the correct approach will be to avoid sharing of mutable memory between threads whenever possible. Sharing immutable (or near-immutable) memory, however, will provide a huge benefit, as one core can write the data and other cores can read it, amortizing the cost of immutable data structures. The option to share mutable data exists, but it should be avoided, with minor exceptions for people who know what they’re getting into.

All in all, I’m really looking forward to multicore, and I think it’ll provide yet another push for OCaml’s adoption.

4 Likes

Um, why? I remember decades ago, the X-windows client lib had a “shm” extension (to speed up comms between client and server when on the same machine). It was implemented by having each end format messages into buffers in the SHM segment, and send pointer/length to the other end, which would -return- that same pointer/length when it was done. So, the -ownership- of the message-regions was managed by sending them via TCP. You could think of the SHM message-regions as just a different way of marshaling messages. I’ve done a similar thing myself for the same reasons.

So one could create a big-ass memory-region, and each worker process could mmap it; then they could communicate with each other in the normal way, using TCP/RPC, and send “ownership tokens” to transfer ownership of blocks of the mmaped region. In your example, there is a “producer” (source of frames) and a “consumer” (sink of frames) and a bunch of stuff in-between. So the consumer would need to “send” frames to the producer, but otherwise, this seems pretty straightforward.

2 Likes

Because shared memory is a hell? Using process with shared memory instead of threads is simply dead wrong. If you want shared memory, you want threads (in most of the cases). Sharing memory between a bunch of process complicating the issue beyond any reasonable limit.

and send “ownership tokens” to transfer ownership of blocks

What if the process needs more space? You need an orchestrating process which manages memory allocation and notifies processes that there is a new page available.

And you need such an extreme complexity for what reason, again? To solve the problem long solved with fivers/threads? That doesn’t make any sense.

1 Like

I think it’s good that we have such vigorous discussions! grin And I’m sure that I’m not going to convince you; just writing down the counter-arguments.

Um, some responses:

(1) opinions differ on whether it is shared memory, or concurrent access to shared memory that is the problem. The singular attribute that makes the X-SHM protocol so tractable, is that each message-region is “owned” by a single process, and other processes neither read nor write that message. Also, keeping this distinction between “memory that is sharable by other threads” and “memory that is private to this thread” makes it much simpler to write code.

Also, I’ll note threading on UNIX came after processes and shared-memory. Long after, as a matter of fact.

(2) Since these ownership tokens are for decent-sized blocks, it’s straightforward to include in them shm-ids (along with probably “suggested addresses to which to map them”). Then there’s no need for an orchestrator process; any process can create new (should be large) segments (not at the granularity of pages) and pass chunks of 'em around. The only thing that’s needed, is for there to be an “eventual parent” that accumulates all the shm-ids, so it can delete them all after the computation is complete.

(3) In any memory-intensive application, it’s necessary to explicitly manage the largest class of memory, and outside the heap. This is imperative for performance. And while GC “researchers” have been telling us (and me) that the next great advance in GC will render such explicit (“memory pooling”) management superfluous for … 30+ years, it is as necessary today as it was in … 1989. In Java we know these things as “ByteBuffers”, IIRC. They’re outside the heap, and if you’re going to do high-performance work, you learn to use 'em.

1 Like

Wow, I go away for a few days and return to this forum to find I’ve entered a time portal back to 2012 :slight_smile:

@XVilka, you’ve been around here for long enough that you should know that the title of this post is simply incorrect, and I dislike answering loaded questions. As @gemmag notes, multicore OCaml wouldn’t exist without Jane Street’s sponsorship over many years of hard work. Please edit the title of this post to correct it for the record.

You’ve also posted in another thread just 15 hours ago that shows you are aware of the active multicore PRs on the OCaml issue tracker, so I’m confused by the implication that there is noone working on multicore OCaml. Are you simply disappointed that it isn’t finished yet, and not shipping overnight?

I understand your desire to just solve your problem and have parallelism for performance. Some thoughts:

  • There are active multicore PRs that are complex, affect all architectures and distributions, and require extensive testing and feedback. You can for example look at ocaml/ocaml#8713 and help verify that it doesn’t regress on your codebases.

  • You’ve posted about performance problems you’re seeing but not really followed up on that with any constructive feedback. It could well be the case that multicore will help with parallel access to some large shared memory structure. It’s a pretty good time to profile your application and to see if it’s a good candidate for implementation within the multicore OCaml branches.

  • Multicore OCaml is making steady forward progress, but requires painstaking benchmarking and careful design to ensure we don’t mess up the lovely single core experience that has served us so well for the past few decades. The reality of the work is that we spend most of our days poring over benchmark results at the moment to understand the multivariate effects of even the smallest changes in the runtime. Take a look at https://github.com/ocaml-bench/sandmark – well-explained macrobenchmark contributions are welcome here.

This thread is so far full of rather well-trodden discussions that we’ve seen many times over the years. I’d encourage you all to look forward to the PRs that are exciting flowing into ocaml/ocaml at the moment and get involved with testing them and providing concrete feedback to help make multicore ship instead!

Generally as a contribution rule, if you see a PR that has been lingering for a while and want to help get it merged, do not just post a “ping” comment on that issue. Instead, take a few moments to clone the PR and build it, check its status against the current master, and see if you can post even a short update of your results along with your query. This will contribute to the PR – even a little more new information is often useful. I’m looking forward to seeing more testing feedback on our various GitHub trackers!

24 Likes
  • Corrected the post.
  • Regarding the Working with a huge data chunks - it is postponed for a while, I focused my efforts on porting my codebase and dependencies to 4.08, to improve exceptions experience.
  • OK, will try to keep myself away from “pings”.
2 Likes

At my current day job, the favorite tool for this is Ach. It’s sufficiently cool that I’ve half a mind to whip up an OCaml interface to it. Alas, it is a hot summer here in San Francisco, and I have too many wonderful things to do outside the house in my copious spare time to take on another hobby project. Maybe someone else will pick up this idea and run with it.

you have to look at what Facebook uses OCaml for first.

Here are some publicly know projects that Facebook uses OCaml for:

  1. Hack is written in OCaml. Their entire backend is built upon this so I don’t think multicore have much values to them in this front. There is rehp which is essentially a OCaml -> Hack compiler so they can reuse code (like validations) on both backend and frontend.

  2. Flow. again, the compiler is written in OCaml but I think with the declining of Flow user bases, there won’t be any investments from Facebook for Flow.

  3. Pfff is a code analysis project

As you can see, Facebook doesn’t have any things that really need the power of multi core.

Code analysis (Pfff and Infer) tools exactly need multicore, to work properly on large codebases.

Is this project still alive?

Hack is using workarounds because ocaml doesn’t have embedded multicore support. See:

There was an article about this shared memory system too, but I can’t find it.

3 Likes

To be honest, BAP woks like this not because OCaml lacks multicore. Imagine, if we had multicore OCaml (or BAP will be written in Haskell or F#), and the core of BAP will be using all those threads and stuff like that. And, you will be still using one core to analyse 100Mb file. So that would be really depressing.

The problem with program analysis is that it is hard if not impossible to parallelize. The good news, is that there are still some options (which as always come with a price). In BAP 2.0 we have a fresh new incremental disassembler, so that you now can run disassembly on several pieces of code in parallel and then join the result (and depending on the connectedness of the control flow graph this could make things both faster or slower than the single thread implementation).

And yes, we’re planning eventually, some time after the BAP 2.0 release to parallelize it. But under no circumstances we even consider to use shared memory parallelism for that. For many reasons (including all the reasons @Chet_Murthy has mentioned, in addition to that the heap is already the most scarce resource we have, so we actually need parallelism to split our heaps).

To summarize, OCaml multicore is definitely not a road-blocker for BAP’s parallelism, nor it is a dependency.

To me personally, the main output of the OCaml multicore is not the multicore itself, but rather the effect system, which is an important development in the type theory and static analysis. And I wish those projects could actually split, because one is unnecessary blocking the other.

12 Likes

TL;DR; the notion of a “function” itself is undecidable, so yes, if we would have an oracle which will say these are the functions, we could run our analysis in parallel, but in real life there are no functions given to us.

And here comes the main difference with the regular program analysis. In Binary Analysis (or Reverse Engineering in general) we don’t have the ground truth of the control flow graph in general, and we have to start with the initial zero knowledge and apply our analysis function to it until a fixed point is reached. Given that the analysis is inherently an undecidable function, there is no fixed point, thus any graph that we will produce would be an under-approximation. For example, an analysis of some initial approximation of a subroutine, may yield the result which will change the set of edges in the graph, which will end up in repartitioning the graph into a different quotient set of subroutines and so on. If we will apply classical map/reduce here, we may end up in a situation where each worker will invalidate the results of all other workers, so instead of getting 1/n improvement we will be getting Γ(n,1) slowdown. (or O(n!) in big-O notation), in the naive implementation.

Therefore it is a big research question, how to parallelize a fixed point computation. But this is not to say, that BAP couldn’t benefit from running some computations in parallel. Fortunately, most of the programs (especially well-behaving programs) are linear in their nature and do not exhibit such level of mutual recursiveness which will render parallel reasoning useless. But we definitely do not need multicore support to implement this.

4 Likes

To answer the specific question “Why doesn’t Facebook sponsor the Multicore project?”: to an outsider like me, it appears that money-related decision-making within Facebook is complex and unpredictable. Many great people within Facebook are using OCaml and/or Reason, and they would of course like to support the external projects that are important to the OCaml and Reason communities, but it looks like they are not able to find the management level at which to make these things happen – for example, Facebook was not in the list of sponsors of this year’s ReasonConf. This is a bit strange, but companies that large companies are always strange in many ways; I guess we just have to live with it.

(It also makes it easier to realize how lucky we have within the OCaml community to have companies willing to fund projects of interest to the wider community that are extremely approachable.)

12 Likes

[avsm@ is right, very, very right, that we should try to stay on-topic. mea culpa, mea maxima culpa. So heeere goes]

I’ve spent some time in large companies that have external developer-focused marketing presence. Some data-points:

[second-hand] (1) it is well-known that even though MSFT developed the CLR, major portions of the Windows team pushed-back on it, b/c they did not believe it adequately managed memory. It was explained to me that sure/sure/sure maybe the CLR is fine for many things, but for system services, its memory-profligacy (and probably also cache-unfriendliness) was unacceptable.

[first-hand, saw it myself] (2) we all hear about how Google is so heavily into Golang. Yeah. Right. Sure. Fine. Bye. When I worked there, there were enormous factions of folks who would never use Golang, and while sure, some systems were written in Golang, THE language you had to know wasn’t Golang: it was C++. With Python in second place, and another language I cannot mention in third.

I mean … sheesh. The external marketing was incredible [in the sense of “not credible”] compared to the real experience.

So, to respond to first-para/last-sentence: yes, these big companies are feudal empires, and they contain multitudes. The thing is, the people who come to you from the ReasonML camp in FB, they have NO GOOD REASON to tell you about all this: they want you to think about FB as a place to work, and a vital contributor to Ocaml. So of course they’re gonna pretend that FB is 100% all-in on Ocaml. As I related above, that’s pretty much what the Golang sycophants pretend is happening in Google.

My first manager in IBM (in 1995) once told me something: “you don’t believe IBM’s marketing, b/c you know the inside story; why do you so eagerly accept other companies’ marketing?”

He. Was. Right.

Don’t believe marketing from folks from I/T companies: they have every reason to deceive you.

I would be remiss if I did not address the last para. Jane Street is exemplary, and regardless of what each of us might think about the technical merits of their software, the fact that they’ve been so -stalwart- in supporting Ocaml is worthy of applause. They didn’t just talk about it, and they didn’t just use ocaml for their own needs without trying to help the community.

OK, I’ll stop.

11 Likes

I’d still like to see an OCaml with rust-like lifetime management, could get rid of the GC and handle multicore quite safely in the great majority of circumstances…

Although I still want to see Algebraic Effects come in to OCaml… ^.^;

How would this be much different from… Rust itself?

3 Likes

Apparently Haskell world works towards this goal with their Linear types project, not sure about progress of it though.

1 Like

See e.g. https://github.com/pikatchu/LinearML/wiki/Tutorial

1 Like

Their type systems have overlap, but it’s not complete and OCaml can still safely type things that Rust cannot do as easily or at all.

I’ve been watching that for a few years but it seems dead as far as I’ve seen…

Ooo a new thing to read. ^.^