32-bit native code support for OCaml 5+

Thanks for summary.

The critical piece I see missing in every commentary about OCaml 5 is that 32-bit native compilation was removed. For me and my small 3-person company that means we won’t adopt OCaml 5 for a very long time (perhaps never [A]).

In general, it would be wise to be upfront to devs and companies about “major feature gaps” between OCaml 4 and OCaml 5. A “major feature gap” IMHO means that a capability was listed on OCaml 4.x as Tier 1 (https://github.com/ocaml/ocaml/tree/4.14#overview) and then was completely removed (not just downgraded from Tier 1 to Tier 2) in OCaml 5.x (https://github.com/ocaml/ocaml/tree/5.1#overview).

Quick summary for why not OCaml 5: My own background the last two decades is with Big Data + AI. Key insight for scaling Big Data has been horizontally sharding data across 32-bit processes; similar to the recommendations you see with Redis (https://docs.redis.com/latest/ri/memory-optimizations/ and https://redis.io/docs/getting-started/faq/#whats-the-redis-memory-footprint). In fact, I have a major effort this year to get OCaml computations into Redis. Key insight for efficient AI is quantization (ie. dropping from fp64 down to something smaller); similar thing happens with ECS game programming where entities are indexed by 16 or 32-bit integers. The bright spot is that 64-bit hardware cost is trending down. But in general, I don’t see the need to be memory efficient disappearing.


[A] If it is technically feasible for my team to restore the 32-bit native backend code in OCaml 5 (ie. a fork), then perhaps it won’t be never. But I detest forks … any number of small code decisions made by someone else in the future can make it technically infeasible to maintain.

5 Likes

From what I know of OCaml on 32-bit architectures, there are 2 major issues:

  1. The header is remarkably overloaded. This causes annoying limitations like limits on the sizes of strings. Most other GC’d languages still use at least 64 bits for the header (and perhaps OCaml should as well, even on 32-bits) because 32 bits is just not enough to do anything useful with header-wise.
  2. The x86 32-bit architecture is horribly starved for registers. What this means is that as soon as you need to add another global, quick-to-access value, you take a serious hit performance-wise. This isn’t the case for other architectures like ARM though. Perhaps only 32-bit ARM could be restored, if there is sufficient demand.

It makes sense to shard Redis, b/c it’s not a mere memory-based blob-store, so internal pointers can cost a lot of memory. But is switching to 32bit for the entire process actually useful for AI? I was under the impression that almost all machine learning was done on GPUs (or special purpose ASICs like TPUs) these days? And also that almost all ML was done on vectors of floats, not on individual floats? In which case, building a 32-bit runtime isn’t very important?

That is to say, if you want fp32 (or fp16), and you’re going to want it in vectors, then there’s no need to build a 32-bit runtime: you can have your 64bit runtime, with special datatypes and operations over [ETA] vectors of these smaller floats?

ETA2: Upon further reflection, for AI/ML, you absolutely do not want a 32-bit runtime. Why not? B/c sure, you want to use fp32, fp16, fp8 floats, to economize on memory. But the reason you’re doing that is to pack more data into memory, and you’re going to want to have as many gigabytes of memory full of those fp16 floats, as you can. Cutting that memory into 2-3GB chunks, b/c you’re using a 32-bit runtime, is going to be absolutely the wrong thing to do.

I checked this with my AI/ML hacker friend, who confirmed these things.

1 Like

I’m calling out @Chet_Murthy’s loud nonsense publicly because it is flat out wrong and harmful to devs who want to enter AI. You can develop state of the art AI, and then deploy it successfully using a 32-bit compiler. Been there, done that. Don’t let companies who have a vested interest in selling GPU cards for thousands of dollars (or rent hosted GPU/TPUs for hundreds of dollars a month) distract you.

[note: the rest of the post was moderated to calm the discussion down]

3 Likes
  1. Is the CUDA bindings you are using 32 bit or 64 bit ?

  2. When you pass data from CPU to GPU, are you (1) passing large tensors of floats (f8, f16, f32, f64) [making the 64bit of the host irrelevant] or (2) passing blocks of memory with embedded 64bit ptrs ? (I’ve never seen anyone do (2) in the pytorch / cuda / … libraries I played with).

  3. ImageNet dataset is a couple hundred GB. Most modern GPUs are > 4 GB. How do you even feed the GPU when using a 32 bit CPU host ?

This discussion should be split into a separate thread.

2 Likes

I definitely agree with @jbeckford that there are substantial uses for having memory-layout efficient data-structures. Not all AI algorithms go straight to GPUs!

However, I don’t think the direction of travel here should be towards supporting legacy 32-bit instruction set architectures, especially when most of those CPUs are themselves 64-bit capable. There are other options for supporting 32-bit heap pointer layouts (thereby preserving memory efficiency) within a 64-bit instruction set architecture (thereby benefiting from…everything else).

The Java JVM exposes a number of options here: this article is a good summary of compressed operands. There is a substantial effort involved in making this approach work with OCaml’s heap value representation, but I think it’s a more realistic research direction to investigate than trying to support 32-bit CPUs for native code generation, especially in the face of multicore parallelism.

6 Likes

I love the research idea. But let’s be honest, that is years out. And there are more choices than just research or maintain 32-bit native code generation:

  1. 32-bit native code generation in OCaml 5
  2. Do research for 32-bit memory layouts
  3. Be upfront with people that 32-bit native compilation is gone in OCaml 5, and let them make an informed decision whether they still want to adopt OCaml 5

Number 1 is a misallocation of finite resources today. Doesn’t matter who does it.

Number 2 sounds like a long way out. I suspect I’d like to contribute to this effort though once my team gets through its current (sadly deep) backlog.

Today though … what is wrong with what I suggested … number 3?

Number 3 is already done at the top of the readme: GitHub - ocaml/ocaml: The core OCaml system: compilers, runtime system, base libraries

Did you have anything else in mind?

1 Like

Yep:

  1. ocaml.org
  2. The major blog posts that talk about OCaml 5

Basically, all the venues that newcomers (newcomers to OCaml, not newcomers to software development) will learn about “OCaml”.

I highly doubt, for example, that a newcomer to OCaml will understand the implications of “On 32-bit systems, only the bytecode compiler will be supported” (the tiny sentence on the github page you referenced).

And when I search for “ocaml” on google, the github page is the 4th reference … implying that newcomers don’t come through the github page.

Summary: That one tiny sentence on a number 4 search rank page is insufficient.

1 Like

I don’t want to ignore your use-case, but what are the odds that a
newcomer cares or knows about 32 bit systems, in 2023? I don’t even know
where I would buy laptop/desktop grade hardware that is 32-bits only now.

2 Likes

Can I change your question to “cares or knows about 32 bit compilation or 32 bit memory layout, in 2023?” Sometimes 32-bit compilation is the only way to get a 32-bit memory layout. That tweaked question is sufficiently general to cover normal server-side use cases like running the recommended 32-bit Redis processes on 64-bit hardware.

I’d answer two ways to easily measure an estimate for it:

  1. (Potential users) How many of the top 50 programming languages had support for 32-bit memory layouts, and then subsequently dropped support for it?
  2. (Current users) How many of the companies / dev teams who are active users in OCaml need 32-bit? Be conservative and just assume one (mine), but I’d like you to be fair and weigh it by some proxy of activity like contributions.

I don’t want to bias the response, so I would really love if some others (@c-cube?) answered that.

(A better estimate would be to ask the majority of non-OCaml development teams (that would be Python, C, C++, Java and C# development teams according to TIOBE Index - TIOBE) whether they would adopt a language knowing it can’t do either 32-bit compilation nor can it do a 32-bit memory layout. If we could do a Reddit or Hacker News poll without throwing OCaml under the bus, that would be ideal.)


I am beginning to suspect that the reticence to clearly say 32-bit native code compilation (C1) support is gone is that we have a good suspicion of the negative reaction a minority of non-OCaml teams will have. My response to that is not being upfront is worse … it will lead to accusations that we have something to hide. We don’t! We are a small community, and we have to pick and choose. So let’s just say it clearly, and move on!

C1: Correction! Apologies for the sloppy noun choice here. I thought the context was clear. “native code generation” is clear.

That would be because 32-bit support is not gone. The bytecode compiler continues to work on 32-bit systems, we conduct best-effort testing to keep the ecosystem working on these platforms, and almost all packages on (e.g.) 32-bit Raspbian will continue to work with OCaml 5 – just more slowly.

32-bit native code generation support is gone in OCaml 5.0, with no plans to bring it back.

5 Likes

I think @jbeckford may have a point in that the status of 32-bit native compilation may not be fully clear to people who do not follow the development of the compiler closely. Accordingly, I opened README.adoc: clarify status of 32-bit native compilation by nojb · Pull Request #12370 · ocaml/ocaml · GitHub to at least make it completely explicit in the compiler README.

Cheers,
Nicolas

2 Likes

Tbh Jonah is also right that the OCaml website needs this info too. The website itself doesn’t list a matrix of supported architectures for recent OCaml versions. This is a pretty reasonable piece of info for a compiler website to mention.

1 Like

The main reason I’d want OCaml support for 32-bit architectures is embedded SoC platforms. If those are bytecode only, then I suspect that will be okay. The bytecode interpreter is probably more space efficient anyway. Embedded platforms are usually more space constrained than time constrained, and you can trade space for time if you need it by writing hot loops in C/C++ and calling them through the FFI.

Can I change your question to “cares or knows about 32 bit compilation or 32 bit memory layout, in 2023?” Sometimes 32-bit compilation is the only way to get a 32-bit memory layout. That tweaked question is sufficiently general to cover normal server-side use cases like running the recommended 32-bit Redis processes on 64-bit hardware.

I’d answer two ways to easily measure an estimate for it:

  1. (Potential users) How many of the top 50 programming languages had support for 32-bit memory layouts, and then subsequently dropped support for it?
  2. (Current users) How many of the companies / dev teams who are active users in OCaml need 32-bit? Be conservative and just assume one (mine), but I’d like you to be fair and weigh it by some proxy of activity like contributions.

I don’t want to bias the response, so I would really love if some others (@c-cube?) answered that.

I sincerely have no idea! I mostly use OCaml with a bit of rust on the
side. I’m not even sure if rust can target x32, and I certainly have
never considered running 32-bit OCaml on my 64-bit machines.

I also don’t think I know, personally, of any OCaml user who explicitly
needs 32 bits support, aside from you! But that’s a very subjective and
limited viewpoint.

(A better estimate would be to ask the majority of non-OCaml development teams (that would be Python, C, C++, Java and C# development teams according to TIOBE Index - TIOBE) whether they would adopt a language knowing it can’t do either 32-bit compilation nor can it do a 32-bit memory layout. If we could do a Reddit or Hacker News poll without throwing OCaml under the bus, that would be ideal.)

A poll would be extremely interesting! For example I wonder how many
C++ users do that (I doubt python or java users care about that, really
— they have more problems when it comes to memory usage than just the
size of pointers).

In rust I think the usual philosophy to get smaller pointers is to use
u32 handles into a Vec/array. It also can be useful to bypass the borrow
checker in some cases.

A poll is simple to do here on this forum. Any interested party should do it :slight_smile:

I regularly use Rust/wasm32. That said, even if OCaml had wasm32 support, I’m not convinced OCaml/wasm32 would be faster than OCaml/jsoo.

In 2018, some expert Coq users would recommend sticking to (native) OCaml 32 bits to reduce the memory footprint of large Coq proofs – see this post by Andrew Appel. Unfortunately I believe that they have now passed the time where “large Coq proofs” could reliably fit in 4GiB of memory, so I guess they are back to 64 bits setups now.