Does {x with y = y'} clone the other fields?

Hi, beginner here with a few basic questions, hopefully not too long of a post.

I was trying to write a basic lexer for a programming language interpreter, and I wanted to represent the lexer state with a record type along the lines of this:

type lexer = {
  source: string;
  pos: int;
}

where source is the source code as a string. This way, the state is immutable and we can produce the next state with something like:

let advance s = { s with pos = s.pos + 1; }

probably returning some character/token as well, checking for end-of-string, etc while leaving the source string unchanged. From the examples I’ve seen online, this seems to be a common idiom.

My question is, does advance create a new copy of s.source (in a new memory location) for the returned record, every time it is called? Because if that’s the case, it sounds expensive, especially if the source string is large.
I suspect there’s no cloning, at least with immutable fields. Like, if we go on utop and write:

# let x = "hello";;
# let y = x;;

then maybe there’s no problem in having both x and y point to the same memory location where the value "hello" lives. The value itself could get GC’d once we can GC both x and y. Is this what happens with the source field above?

I’ve read somewhere that all OCaml values other than integers are boxed – is that what comes into play in the previous example? I imagine that if what x holds is just a pointer, then it’s easy to give that same pointer to y, where neither pointer allows mutating the actual "hello" value in memory.
Would this mean that cloning does happen if the source field was an (unboxed) integer? Say here:

# let x = 142857;;
# let y = x;;

does the value 142857 actually get cloned into a new location in memory? Although I would expect cloning integers to be pretty cheap, so maybe it’s not a big deal.

Are there other more complicated types with which field cloning does happen? Does it depend on whether the field is mutable or not? I imagine it could work like in the previous paragraph, but making it so that the pointers actually allow mutation of the value in memory.

Finally, what if for some reason you did want the return value of advance s to hold a fresh clone of the s.source field value, living in a different part of memory? Is this the kind of thing you would use ref for, say like

type lexer_with_ref = {
  source: string ref;
  pos: int;
  (* ... *)
}

let advance_with_ref s = { s with source = ref (!s.source), pos = s.pos + 1 }

or does this just change pointers as well? I guess I don’t get ref too well either. Does assignment with := only change the pointer held by the contents field, or does it mutate actual values in memory?

Actually, is there even any situation where you can take a location in memory holding some non-pointer value and overwrite it with a different value, or is it immutability all the way down?
I imagine you could have every value “exist” in one place, allocate a brand new place for every new value, and just have each binding point to the correct place. This way (I think?) you never really need to clone unboxed values; you just clone a pointer and give it to the receiving binding. Doesn’t this come with an efficiency tradeoff compared to being able to overwrite places for no-longer-needed values?

Sorry if these are all really basic questions – I’m kinda new to this immutable-first style and how it deals with memory management. Thanks for any help.

1 Like

Hi,

Yes! All other fields are copied.

Yes. Every value in OCaml is in one of two categories: integer-like values, which are stored on a single memory word and have their low bit set to 1, and pointers to allocated blocks. (A good reference about that.) Strings fall into that latter category. As a consequence, the functional update (as we call it) of your record only copies two words.

Overall, yes. In reality compiler optimisations make it so that there will not necessarily be a memory location associated to every variable of your program. The compiler will do its best to use as few CPU registers and stack words as possible.

No, not really. A functional update allows performs a “shallow” copy. So mutating a field in the copy will not mutate the original. However, the original and the copy may point to a shared value (in fact they will, since if there are any pointers are just copied), and that shared value may be mutable.

utop # type t = { a : char array; b : int };;
utop # let x = { a = Array.make 5 'c'; b = 42 };;
val x : t = {a = [|'c'; 'c'; 'c'; 'c'; 'c'|]; b = 42}
utop # let y = { x with b = 0 };;
val y : t = {a = [|'c'; 'c'; 'c'; 'c'; 'c'|]; b = 0}
utop # x.a.(2) <- '&';;
- : unit = ()
utop # x;;
- : t = {a = [|'c'; 'c'; '&'; 'c'; 'c'|]; b = 0}
utop # y;;
- : t = {a = [|'c'; 'c'; '&'; 'c'; 'c'|]; b = 0}

'a ref is an ordinary record type (type 'a ref = { mutables contents : 'a } as you probably know), so hopefully the explanations above the layout of values can help clarifying this. Every call to advance_with_ref allocates a new lexer_with_ref pointing to a newly allocated string ref.

(Indeed function ref (not to be confused with type ref :slight_smile: ) is merely:

let ref x = { contents = x }

)

However the pointed string is always the same object, at the same memory location.

It does modify the field, but be aware that mutating a string field merely replaces the pointer with a pointer to a different string, it does not mutate the string itself (what’s more, strings are not mutable).

3 Likes

Not sure to understand your question, but hopefully the explanation about the layout of values of memory, and the semantics of functional update and field mutation, help answering it.

1 Like

Thank you for the comprehensive reply, it’s really helpful! I think most of what I wrote about is much clearer to me now. The crux being this:

The lexer_with_ref example was just me trying to think of a case where we have/want in-place mutation – I was thinking of what it meant exactly for the contents field to be mutable. But I think your array example illustrates it much better. Given that {x with y = y'} only copies a word per field, the shallow-copying behavior lines up with my intuition now; same with the 'a ref case.

I was mostly just trying to follow a train of thought related to the lexer_with_ref example. But yeah, what I was trying to grasp is just some of the basic memory semantics around field mutation, why they work well in OCaml, and how they compare to what some other (imperative) languages do. Thanks again for the explanation!

1 Like