Js_of_ocaml: OCaml's values and the structured clone algorithm

dbuenzli · October 11, 2022, 9:37pm

Does anyone knows if js_of_ocaml’s encoding of OCaml values for reasonable types (ground types, records, arrays, lists, variants, polyvars not including functions or objects) survives a trip through the JavaScript structured clone algorithm ?

That is can I safely ping-pong OCaml values with web workers without having to serialize them to strings ?

I would be enclined to say yes and I have a few encouraging tests but I bet someone like @hhugo has a better idea about problems I may be missing.

vouillon · October 12, 2022, 1:42pm

This will not work with bytes, int64 and big arrays, which are implemented as an object with a constructor. There is also an issue with exceptions, that rely on physical equality.
But other values are basically translated to arrays, strings and numbers, so they should indeed safely survive a trip through the structured clone algorithm.

dbuenzli · October 12, 2022, 2:00pm

Thanks for your answer @vouillon.

Just to make things clear when you said bytes that does not include string values, right ? (They did seem to round trip in my tests).

vouillon · October 13, 2022, 8:46am

Right, immutable string values are now mapped to JavaScript strings.

vouillon · October 13, 2022, 12:22pm

Well, actually you need to use --enable use-js-string for that.
Otherwise, the structured clone algorithm should break string comparison and hashing.

dbuenzli · October 13, 2022, 1:08pm

I see. Is there any reason why this is an option ? This kind of choices always bring more complexity from a systems perspective (feature interactions in bug reports, expectations in FFI code, etc.).

Also I’m thinking maybe it would be good to have a best-effort zero-copy primitive say caml_js_{to,of}_structurally_clonable, that handles the bytes, int64 and bigarray cases and that one can call when sending or receiving such values.

hhugo · October 14, 2022, 1:46pm

the reason --enable use-js-string is not the default is that it’s a potentially breaking change. I’ve tried to fix various js stubs over the years but was never confident enough to make it happen. see Use javascript string by default by hhugo · Pull Request #976 · ocsigen/js_of_ocaml · GitHub

hhugo · October 14, 2022, 1:52pm

Js_of_ocaml.Json.{output,unsafe_input} does some of this (string and int64) I believe.

dbuenzli · October 14, 2022, 2:32pm

Do you have a reasonable idea on who could be affected ? I guess most people should be insulated by the various FFI bits no ?

It’s a bit of a chicken egg situation. I suspect that this kind of options do not get tweaked by most people. If string as JS string is the future I think js_of_ocaml should start by inverting the default (major release), and a few release later drop the option.

Indeed it seems to do funny stuff with the reviver but of course you don’t want to serialize to a string here, you want to keep the value as is, except for when you can’t.

dbuenzli · October 29, 2022, 2:53pm

This is related but a bit OT for both the topic and the OCaml forum, but I rather have this in this topic.

I’m just wondering if anyone has experience with webworkers. I’m getting miserable performance (70s) using them in contrast to blocking the page UI thread (3s), even if I subtract the data transfert overhead of 2-3s, that still seems a bit egregious.

Not sure if I’m doing something wrong or if that’s kind of expected, anyone has a hint ?

dbuenzli · October 30, 2022, 1:46pm

Thanks for you feedback. I’m not starting a worker on each computation, I have a work queue (gist here) at which I throw work.

Regarding the transfer it’s one large ocaml value with a lot of geometrical data time series, I could cope with that. What I find really strange is massive difference in execution time – I thought maybe workers were throttled in some way but according to what you write that doesn’t seem to be the case.

However the way I go about these things is likely not very orthodox, it could be related to that (?) :

I use the same script as the page script, the main function checks for Brr_webworker.Worker.ami and dispatches on the page code (which starts the webworker) or the worker code. Basically a fork mecanism.
I have a dirty hack in place to be able to make that work over the file:// protocol. Because stupidly you can’t invoke your own script over file://, you get a cross-origin error. So I embed my page script in the page itself. This allows me to get the source, make a data url of it and invoke the worker with that (the nice thing though is that if you do so with all ressources you get a single double-clickable file “application”).

I found it a bit difficult to investigate what’s really happening and lack a bit the time to go deep into this so for now I’ll likely rather continue do it on the ui thread with a bit of setTimeout relaxation.

dbuenzli · October 31, 2022, 4:23pm

That doesn’t seem to be the case. In fact I tried to simply round trip my data with the worker and then perform the computation on the ui thread again and I get the same timings and, in fact, wrong results.

It seems that going trough the structured cloned aglorithm breaks the OCaml values in some subtle way which in turn means that my computation constantly hits slow paths that explain the slowdown.

Still unclear for now what breaks, testing for equality between the data and the round-tripped data returns true.

dbuenzli · October 31, 2022, 6:00pm

So I had some kind of stub value against which I was physically testing. Of course these kind of values get a different address when you cross boundaries and this had catastrophic effects on the computation in this case.

Conclusion: value your option types :–)

I now get my results within the same amount of time, just a bit disappointed by the high transfer time overhead.

copy · October 31, 2022, 7:26pm

In case it’s of any help, ArrayBuffer objects can be sent from and to workers without copying, using the transferList argument to postMessage (this is unrelated to SharedArrayBuffer, which is not widely supported in browsers yet).

ArrayBuffers can also be viewed from Bigarrays without copying, and most web APIs support ArrayBuffers (including XHR, WebSocket and FileReader)

dbuenzli · October 31, 2022, 8:33pm

Thanks for the tip. Unfortunately the code to execute works with lists of gg points not bigarrays so that won’t really help.

Still, I managed to halve the transfer times by peeling a bit the arguments I was sending from large amounts of data that were not being used by the actual computation in the worker.

For the fun I wanted to compare to what a trip via a single Marshaled string would give but alas you can’t marshal floating points with jsoo.

zapashcanon · November 4, 2022, 1:10pm

Actually, you can put your floats in a bigarray and marshalling will work - I had to do it recently. I was also expecting that putting them in a record with floats only would work but it doesn’t ; but I believe jsoo could be patched to make it work in that case as there’s a tag saying it’s a float record.

On a side note, we also wanted to send the marshalled string via a websocket (using Brr client side and Dream server side) I though that using Websocket.send_blob would be enough but for some reason it wasn’t working (we’ll open an issue soon after making sure we didn’t miss something else).

The dirty hack to make it work was something like: Format.sprintf "%S" marshalled_string on the client side and some Scanf on the server…

dbuenzli · November 4, 2022, 1:46pm

Bigarray’s are backed by typed arrays so I don’t think you’d need any marshal here, it should structurally clone cleanly.

I suspect you stumbled over binary vs text issues.

Between string as byte sequences or as UTF-8, bytes as byte sequences or as UTF-8, bigarrays, Javascript strings, JavaScript binary strings, JavaScript typed arrays, JavaScript buffer arrays and JavaScript blobs, it’s easy to confuse oneself (see e.g. Base64 decoding · Issue #18 · dbuenzli/brr · GitHub, maybe a few easier conversion paths are still missing in brr to clarify all that).

If you do, do it rather early than late, I’ll soon make a release.

zeroexcuses · August 5, 2023, 7:47pm

LOL. I just got bit trying to ping-pong Int64.t

Topic		Replies	Views
Js_of_ocaml: Bytes.t <-> Uint8Array Learning	4	409	July 29, 2023
[ANN] Js_of_ocaml 5.1 Ecosystem jsoo , announce , js_of_ocaml	17	1230	March 21, 2023
Unmarshalling in Js_of_ocaml data serialized using native OCaml marshalling Community js_of_ocaml	3	346	March 14, 2024
What OCaml value does this JS value encode? Learning	5	349	August 2, 2023
Porting ocaml to js Community js_of_ocaml	2	373	June 22, 2024

Js_of_ocaml: OCaml's values and the structured clone algorithm

Related topics