Hi, beginner here with a few basic questions, hopefully not too long of a post.
I was trying to write a basic lexer for a programming language interpreter, and I wanted to represent the lexer state with a record type along the lines of this:
type lexer = {
source: string;
pos: int;
}
where source
is the source code as a string. This way, the state is immutable and we can produce the next state with something like:
let advance s = { s with pos = s.pos + 1; }
probably returning some character/token as well, checking for end-of-string, etc while leaving the source string unchanged. From the examples I’ve seen online, this seems to be a common idiom.
My question is, does advance
create a new copy of s.source
(in a new memory location) for the returned record, every time it is called? Because if that’s the case, it sounds expensive, especially if the source string is large.
I suspect there’s no cloning, at least with immutable fields. Like, if we go on utop and write:
# let x = "hello";;
# let y = x;;
then maybe there’s no problem in having both x
and y
point to the same memory location where the value "hello"
lives. The value itself could get GC’d once we can GC both x
and y
. Is this what happens with the source
field above?
I’ve read somewhere that all OCaml values other than integers are boxed – is that what comes into play in the previous example? I imagine that if what x
holds is just a pointer, then it’s easy to give that same pointer to y
, where neither pointer allows mutating the actual "hello"
value in memory.
Would this mean that cloning does happen if the source
field was an (unboxed) integer? Say here:
# let x = 142857;;
# let y = x;;
does the value 142857
actually get cloned into a new location in memory? Although I would expect cloning integers to be pretty cheap, so maybe it’s not a big deal.
Are there other more complicated types with which field cloning does happen? Does it depend on whether the field is mutable or not? I imagine it could work like in the previous paragraph, but making it so that the pointers actually allow mutation of the value in memory.
Finally, what if for some reason you did want the return value of advance s
to hold a fresh clone of the s.source
field value, living in a different part of memory? Is this the kind of thing you would use ref
for, say like
type lexer_with_ref = {
source: string ref;
pos: int;
(* ... *)
}
let advance_with_ref s = { s with source = ref (!s.source), pos = s.pos + 1 }
or does this just change pointers as well? I guess I don’t get ref
too well either. Does assignment with :=
only change the pointer held by the contents
field, or does it mutate actual values in memory?
Actually, is there even any situation where you can take a location in memory holding some non-pointer value and overwrite it with a different value, or is it immutability all the way down?
I imagine you could have every value “exist” in one place, allocate a brand new place for every new value, and just have each binding point to the correct place. This way (I think?) you never really need to clone unboxed values; you just clone a pointer and give it to the receiving binding. Doesn’t this come with an efficiency tradeoff compared to being able to overwrite places for no-longer-needed values?
Sorry if these are all really basic questions – I’m kinda new to this immutable-first style and how it deals with memory management. Thanks for any help.