`bytes` vs `char array`

StrongerXi · March 24, 2021, 7:35pm

To me it seems that bytes is basically char array, but why bother having such a primitive when we could in theory implement a Byte module with an abstract type t = char array? Were there historical reasons or am I missing something else?

Thanks!

dbuenzli · March 24, 2021, 8:02pm

You are missing the memory representation !

'a array is a polymorphic type. This means that each array slot has a pointer to the actual 'a value.

So a char array is an array filled with pointers to individual characters. That’s neither efficient, nor what most system IO APIs expect.

A bytes value is a contiguous sequence of bytes in memory.

In ascii-art terms for the string “abc” (not exactly see my correction below):

+---+---+---+
| . | . | . |
+-|-+-|-+-|-+
  v   v   v
  a   b   c

vs

+---+---+---+
| a | b | c |
+---+---+---+

StrongerXi · March 24, 2021, 8:24pm

Thanks. So it’s purely for performance reason?

Also, why couldn’t array store a bunch of value directly? Based on my understanding of the runtime, if the value is an integer or char, we just shift it to get the actual value, else if it’s a pointer, we de-ref it to get the actual value, etc. But my understanding of the internals is very rudimentary, so please correct me if I’m wrong!

References:

dbuenzli · March 24, 2021, 8:27pm

And interoperability reasons.

Note that actually what I wrote above is slightly wrong since char are effectively represented by integers and integers are unboxed in OCaml so what you have in the case of char array is:

 +------+------+------+
 | ...a | ...b | ...c |
 +------+------+------+

But the size of the cells of the array is the word size of your machine (i.e. enough to be able to hold a pointer), so you still don’t have the packed representation expected by a C array of bytes.

StrongerXi · March 24, 2021, 8:40pm

Ah, that is a good point.

Would you mind elaborating a little on the “interoperability” part? Or if you could point me to sources, I’m down to dig it up on my own.

Thanks again!

dbuenzli · March 24, 2021, 8:49pm

Note that with a char array you end up wasting 7 bytes per byte on a 64-bit machine, so that becomes quickly costly.

Regarding interoperability. Suppose you want to call the C write(2) function.

The function takes a buffer b to read from and a number n of bytes to read. But the function reads n contiguous bytes from b, so if you give it a char array it will read the wasted bytes mentioned above which is not what you want.

StrongerXi · March 24, 2021, 8:51pm

I see now. Thank you so much for answering all my questions!

LdBeth · March 25, 2021, 1:54am

It’s some design decisions that the designer of OCaml language has made:

OCaml does not have very strong support for overloading. Neither the bytecode compiler attempts to generate specialized code code when the function is polymorphic (although the native compiler could do so on some cases for optimization), nor the Stdlib.Array written in a fashion that more effective memory representation is chosen at runtime.

The former is an analogy to Haskell’s typeclass or C++’s template, and the later is what dynamic typed or OOP based languages such as Common Lisp could do. They can have the “packed char array” or even bitarray rather easily while maintaining a unified interface.

silene · March 25, 2021, 6:18am

Actually, it is, since the standard library already supports packed float arrays. (I am not discussing whether this was a good or bad decision, just that it has been implemented for a long time.)

There is a bit of a technical difficulty when it comes to packed char arrays though. Indeed, packed float arrays rely on the fact that float values are boxed and thus their dynamic type is known at runtime. Since char values are not boxed, the runtime would have no way to know whether Array.make is supposed to create a packed char array or a generic value array. Other Array functions (which receive already created arrays) do not have this issue and would work fine with packed char arrays, were they implemented.

Topic		Replies	Views
Should I use bytes or string? Learning	2	882	May 14, 2022
OCaml Array Confusion Learning	2	562	November 18, 2020
Js_of_ocaml: Bytes.t <-> Uint8Array Learning	4	413	July 29, 2023
OCaml's char type internal implementation Learning	1	627	July 27, 2018
Could we move string and bytes to sliced types? Ecosystem language	27	730	December 31, 2024

`bytes` vs `char array`

Related topics