Handling binary data in ocaml and javascript

I have some ocaml code that reads in a binary file as a string, and then treats it as an array of bytes. This led to some odd errors when I tried to compile it with jsoo, which I eventually traced down to the fact that javascript strings aren’t really suited for the purpose, and I should be using an ArrayBuffer instead. Is there any nice way to create an interface that will compile to the string code in ocamlc and to Typed_array code if compiled with jsoo?

2 Likes

Maybe you could get more help if you explain what where you trying to do and what were the errors in question . AFAIK, OCaml-strings-as-byte-buffers will work regardless of which compiler is used.

I’m trying to load a binary file as a string, and parse it using bitstring. when i try reading the file as a string in javascript, I see a few extra bytes inserted, which makes me think that when I call ReadAsBinaryString, javascript is using a string encoding that spills some characters into two bytes.

Furthermore, when I was googling to see if any one else had had this issue, I saw that ReadAsBinaryString has been deprecated and that I’m supposed to read the file into an ArrayBuffer instead. I’ll experiment with converting the ArrayBuffer to a string and passing it to the original code, but if that doesn’t work either some sort of #ifdef js_of_ocaml equivalent would be useful.

this stackoverflow comment seems relevant:

The reason behind readAsBinaryString() deprecation is in my opinion the following: the standard for JavaScript strings are DOMString which only accept UTF-8 characters, NOT random binary data. So don’t use readAsBinaryString(), that’s not safe and ECMAScript-compliant at all.

poking around the jsoo source I found that runtime/mlString lets me directly create an ocaml string backed by a byte array, I’m now trying to get that playing nicely with gen_js_api, which I believe will solve my issue.

code snippets, for anyone else facing this issue:

this from the js.mli file (processed by gen_js_api):

module ArrayBuffer: sig                                                                                                                                                                                                                        
  type t                                                                                                                                                                                                                                       
  val t_of_js: Ojs.t -> t                                                                                                                                                                                                                      
  val t_to_js: t -> Ojs.t                                                                                                                                                                                                                      
                                                                                                                                                                                                                                               
  val byte_length: t -> int                                                                                                                                                                                                                    
end                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                               
module Uint8Array: sig                                                                                                                                                                                                                         
  type t                                                                                                                                                                                                                                       
  val t_of_js: Ojs.t -> t                                                                                                                                                                                                                      
  val t_to_js: t -> Ojs.t                                                                                                                                                                                                                      
                                                                                                                                                                                                                                               
  val new_uint8_array: ArrayBuffer.t -> t [@@js.new]                                                                                                                                                                                           
  val to_string: t -> string [@@js.cast]                                                                                                                                                                                                       
end                     

and this from the ml file:

external str_of_uint8Array: Js.Uint8Array.t -> string = "caml_string_of_array"                                                                             
                                                                                                                                                           
let str_of_arraybuffer buf =                                                                                                                               
  let arr = Js.Uint8Array.new_uint8_array buf in                                                                                                           
  str_of_uint8Array arr