Experience report: making a JavaScript library from an OCaml library with js_of_ocaml

I have recently compiled our whole codebase with js_of_ocaml, producing a PDF processing library with can be used from JavaScript programs on the server, or within the browser.

I started with no knowledge of JavaScript or its ecosystem, and the js_of_ocaml examples don’t touch on this particular use case, so I thought it would be useful to write up a quick experience report. I’m still pretty naive on the topic, so do take it with a pinch of salt. Corrections welcome.

Perhaps you could make your favourite OCaml library accessible from JavaScript?

Source code: Coherentpdf.js.

Existing Code

For this project, I used the OCaml source code from the PDF library CamlPDF, the command line PDF tools CPDF, and the C interface to CPDF, CPDFLIB. CPDFLIB is an API allowing access to the PDF library from C via the OCaml FFI (and, therefore, from .NET, Java, Python etc).

CamlPDF source | CPDF source | Cpdflib source

This code is almost all OCaml, with a small amount of C code for cryptography and compression.

We will be recompiling all this code with js_of_ocaml, to yield a JavaScript library which can access all the functions in CPDFLIB. Since CPDFLIB is a flat API with no direct access to OCaml data structures, this should be easy to access from JavaScript. Building a JavaScript library which had to directly manipulate OCaml data structures would be more difficult.

Compilation procedure

Our build procedure will be simple. Copy all the OCaml and C source files from the CamlPDF, CPDF and CPDFLIB into our build directory and then:

  • compile and link a bytecode executable with ocamlc using a makefile
  • call js_of_ocaml to turn the resulting bytecode file into JavaScript

(dune can also be used for both these steps)

Why the C files? Because we need them to link the bytecode executable: the C code will be thrown away by js_of_ocaml, but we must get the bytecode linked.

Js_of_ocaml informs us that we have missing primitives: the C code we will need to replace with JavaScript later, but the thing builds, and a couple of megabytes of JavaScript is produced. It doesn’t do anything yet, because we have no way of calling into it. But we can load it as a module in node even at this stage.

If we miss out the cpdflib source files, though, and just include camlpdf and cpdf, we get the cpdf command line tools, which we can run in node as a command line tool, and they work just fine (if we don’t use compressed or encrypted files, of course):

$node cpdf.js -pages cpdfjsmanual.pdf
152

Magic! About five or six times slower than the native code version, but it works with minimal effort. Js_of_ocaml has supplied an alternative set of primitives for input and ouput, and an alternative runtime, and they replaces OCaml’s transparently.

Replacing C with JavaScript

Our C code will be thrown away by JavaScript, so we need to provide alternatives. js_of_ocaml allows us to write them in JavaScript with some special comments. It will then plug each in for its corresponding OCaml external (C symbol in the linked OCaml bytecode). Here’s the replacement for one of CamlZip’s functions, using node’s own zlib library:

//Provides: camlpdf_caml_zlib_compress
//Requires: caml_bytes_of_array
//Requires: caml_array_of_bytes
function camlpdf_caml_zlib_compress(s)
{
  var s2 = caml_array_of_bytes(s);
  var buf = Buffer.from(s2);
  var output = zlib.deflateSync(buf);
  return caml_bytes_of_array(output);
}

In this case, it was not a direct replacement - the API is different. So we must modify our OCaml code to be compilable in both OCaml and js_of_ocaml, like this:

(* js_of_ocaml only *)
external camlpdf_caml_zlib_compress : string -> string = "camlpdf_caml_zlib_compress"
external camlpdf_caml_zlib_decompress : string -> string = "camlpdf_caml_zlib_decompress"

let is_js =
  Sys.backend_type = Sys.Other "js_of_ocaml"

let encode_flate stream =
  if is_js then
    Pdfio.bytes_of_string (camlpdf_caml_zlib_compress (Pdfio.string_of_bytes stream))
  else
    flate_process (Pdfflate.compress ~level:!flate_level) stream

We also need some fake stubs in our C code to make sure it still compiles in the non-JavaScript case with these new externals:

// So that the code links ok when using js_of_ocaml
char* camlpdf_caml_zlib_decompress(char *s) { return s; }
char* camlpdf_caml_zlib_compress(char *s) { return s; }

This code will never be called, of course: it is simply to make the symbol available.

Writing the JavaScript interface

Now, we use the provided PPX js_of_ocaml-ppx to provide a JavaScript interface to each function, and export them as functions in our JavaScript module:

Js.export_all
  (object%js
     method fromFile filename userpw =
       checkerror (Cpdflib.fromFile (Js.to_string filename) (Js.to_string userpw))
     ...hundreds more functions...
  )

Notice Js.to_string here. Conversions to and from JavaScript will be required for many data types, for example strings, arrays, bigarrays and so on. In our case, Cpdflib.fromFile returns a simple integer, so no conversion is required.

Trying it out

Now that we have the library working, we can try it out by running a JavaScript file with node, or by typing into the node REPL:

//Load coherentpdf.js
const coherentpdf = require('./coherentpdf.js');

var pdf = coherentpdf.fromFile('hello.pdf', '');
var merged = coherentpdf.mergeSimple([pdf, pdf, pdf]);
coherentpdf.toFile(merged, 'merged.pdf', false, false);

coherentpdf.deletePdf(pdf);
coherentpdf.deletePdf(merged);

Compiling for the browser

We can use the browserify tool to bundle up our code and its external libraries, including the parts of the node standard library we use, to make single JavaScript file for use within a web page. This nearly doubles the size:

2192588 coherentpdf.browser.js
1136687 coherentpdf.js

Here is an example of a web page which uses coherentpdf.js to process a PDF file entirely in the browser. You can choose a file, and it will be processed and a download initiated with the output:

https://coherentpdf.com/coherentpdfjs/index.html

Of course, we have a problem now: in the browser, there are no files to read to or write from - the browser is a sandboxed environment. So we must make sure our API also contains functions to read to and write from PDF files represented as byte arrays.

There is an additional issue: JavaScript in the browser does not cope well with large chunks of synchronous code like ours: processing a big PDF file would lock up the browser (or at least the tab). We must use what is called a “web worker” to run it in the background in another JavaScript environment, and communicate by message-passing only. See the file cpdfworker.js for this code.

Minification

The uglify-js tool can be used to minify the JavaScript for deployment on the web:

2192588 coherentpdf.browser.js
1324660 coherentpdf.browser.min.js
1136687 coherentpdf.js
849949  coherentpdf.min.js

We are now down to 1.3Mb. Many web servers can serve gzip’d content too, so that helps further, and we get down to 514Kb actually sent over the web.

Documentation

Because we built our library from within OCaml, and had it generated for us by js_of_ocaml-ppx, there is no place to put the docstrings. So, unfortunately, we must write a separate JavaScript source file with empty functions like this:

/** Returns the number of pages in a given PDF, with given user password. It
tries to do this as fast as possible, without loading the whole file.
@arg {string} password user password
@arg {Uint8Array} data PDF file as a byte array
@return {number} number of pages */
function pagesFastMemory(password, data) {}

Now we can run any standard JavaScript documentation generator over this to produce the HTML documentation.

Publishing the package

Publishing the package on the npm package system is alarmingly easy - a package.json file, mostly autogenerated, is added, and then it is published from the command line. You can see it here:

https://www.npmjs.com/package/coherentpdf

Here is the source, including the package.json:
Coherentpdf.js.

Licensing

Our command line PDF tools and C/.NET/Java/Python APIs are under a commercial license, or alternatively a non-standard “not for commercial use” license. This is tiresome, because this non-standard license prevents them being included in, for example, linux distributions.

For coherentpdf.js, the license is just for the JavaScript output of js_of_ocaml, not the original OCaml source, so it can be given a more standard AGPL license, whilst still being available for purchase, and without opening up the commercial OCaml code. The AGPL isn’t everyone’s favourite license, of course, but we start there for now.

To do

What doesn’t work yet? Just two things:

  • The default stack available in many browsers is small (and some have a smaller stack for web workers), so coherentpdf.js can choke on very large PDF files. This is not a problem in node on the server. This will have to be fixed by modifications to the OCaml code itself.

  • js_of_ocaml installs its own top-level error handler, with the unfortunate side-effect that any error in the node REPL after the module is loaded - even a syntax error - causes the REPL to exit. I plan to produce a patch to make this behaviour optional.

Final Remarks

js_of_ocaml is a remarkably solid piece of kit. To be able to recompile fifteen years of accumulated code with just a few changes was very surprising to me.

Thanks to the js_of_ocaml team and others for answering all my questions during this process, and correcting some of my misconceptions.

I’m still very much a JavaScript newbie, and it’s not a language or platform I’ve grown to love, but if you want your OCaml code to run in the browser, or your OCaml library to be available to millions of JavaScript programmers, you might try giving it a go.

26 Likes