OCaml 32bits memory limit

Hello,
I am working on an XML parser using OCaml. I have a big file of 470 MB having 9.6 million XML nodes in it.

I just want to know what is the OCaml 32 bit memory limit so that I can optimize my code very effectively.

So, could you please tell me what is the memory limit for OCaml 32 bit process?

Thank you in advance

There’s some discussion of OCaml’s memory representation here: https://dev.realworldocaml.org/runtime-memory-layout.html and here https://caml.inria.fr/pub/docs/manual-ocaml/libref/Bigarray.html.

In particular, 32 bits limits you to 16MB strings, 31 bit integers, and 4,194,303 element arrays. Bigarrays aren’t subject to these limits, and will be limited by the amount of memory that your process can address.

In addition to @jfester perfectly correct reply, let me add that the operating system and execution environment used may further restrict how much memory a single process can use. The natural limit for a 32-bit platform is 4 Gb, but I remember some Linux versions with a 3 Gb limit, Windows versions with a 2 Gb limit, and Cygwin versions with a 1 Gb limit.

If you can use a 64-bit platform, by all means go for it, as it will avoid all these silly limitations.

Typically when dealing with large files like this, it’s best to process them in a “streaming” manner. So that you only hold a fraction of the file in memory at any one time.

xmlm is such a streaming parser by @dbuenzli.

Hi Chet_Murthy,

Thank you for this suggestion. But could you please give some more details?

Hello,

you should have a look to xmlm (https://erratique.ch/software/xmlm), pointed to by @Leonidas.

This chapter of pxp manual explains the streaming parser thing:
http://projects.camlcity.org/projects/dl/pxp-1.2.9/doc/manual/html/ref/Intro_events.html

Best regards

I haven’t worked with XML in years, and when I did, I wrote my own parsers, so I cant’ point at particular software. @ttamttam has made some suggestions. More generally, there are (for XML) DOM parsers and streaming parsers. You want the latter.

This is (or was) a very active area of development for Java, so you might find good reading material in the Java world.