To answer your question “does de-allocation creates a side-effect?”:
To state the obvious: if you care about the memory consumption of your program, then you care about the side-effect of de-allocation, and this indeed voids purity.
A language like OCaml lets you reason about de-allocation. Memory is collected when values are no longer reachable. Like in other languages, 1) a value that does not escape and goes out of scope will get collected, and 2) you can reason about when a value escapes and goes out of scope thanks to OCaml respecting the strict evaluation order of value types. OCaml (like other compiled languages) is in fact more precise: it ties the dynamic notion of reachability to the lexical notion of variable occurrence. For instance, in the following:
let x = get_huge_data () in
let z = long_running_function x in
OCaml will be able to collect the value in
x goes out of scope, and thus if possible before
long_running_function returns. Indeed, OCaml performs liveness analysis during compilation, and records the information about variable occurrences in frame descriptors, for consumption by the GC when it scans for roots. In fact, you can rely on call-by-value operational semantics to (loosely) reason that a value no longer appears in a program, and therefore that the corresponding memory will be collected by the GC¹ (Morrisett, Felleisen and Harper, “Abstract Models of Memory Management”). Of course, using lazy or higher-order interfaces (when closures escape; with many idioms they do not) will make it harder to reason about the lifetime of values.
(¹: For OCaml, this is a conjecture I make, for subsets which could be given such operational semantics, and only for native compilation. Morrisett, Felleisen and Harper’s semantics obviously assumes that the results of liveness analysis are made available to the GC, but this is not written, nor is there any mention of the link between liveness analysis and accuracy of garbage collection in Appel’s “Modern Compiler Implementation in C”. I assume that it was part of folklore at the time, though recently I mentioned it to some functional PL researcher and they seemed surprised. I only found it explicitly mentioned in later papers from the OOP community. I checked that everything seems in place for OCaml to allow such reasoning, but only the authors of the original code, @xavierleroy and @damiendoligez, can tell us if this is intended to be part of the language semantics.)
Furthermore, memory is not collected immediately when a value becomes unreachable. Instead:
Short-lived values are allocated contiguously and deallocated in a batch, so that allocating and deallocating short-lived values is very cheap, with additional benefits in terms of cache locality. This replaces stack allocation from languages with explicit memory management.
Longer-lived values are moved to a heap that is scanned incrementally, to ensure a bounded latency. In contrast, naive reference-counting and unique pointers from C++/Rust make you pay the cost of deallocation up-front.
While this is essential for understanding the performance of OCaml programs, from the point of view of deallocation-as-an-effect, the delaying of the collection of unreachable memory can be seen as a runtime optimisation, that does not change the effectful status of deallocation (the memory still gets freed). [The intuition is that an effect can support some degree of reordering without requiring purity, as illustrated by strong monads which can be commutative without being idempotent, one possible definition of purity for semanticists.]
But is de-allocation an effect in practice? Faced with the scepticism and misunderstandings from this thread, I emit two hypotheses:
Memory consumption is not an issue in functional programming, for application areas that interest functional programmers.
Memory management in OCaml is efficient in such a way that programmers do not need to think about it in their day-to-day programming activities in those terms.
Hypothesis 2) could be explained for instance if OCaml programmers are already dealing with effects and thinking about the order in which their code executes (my experience), and are only used to deal with deallocation as an afterthought, e.g. when chasing leaks with a profiler.
Let us turn towards two programming language experiments from the 1990’s that allow me to reject hypothesis 1). Both show what happens when one denies the status of deallocation as an effect controlled by the programmer.
Region-based memory management consisted in allocating in a stack of memory regions deallocated at once, and determined by a whole-program static analysis. Now regarded as a failed idea but successful experiment (i.e. good science!), it taught us a lot about the structure of functional programs in relationship to memory management (see this retrospective). There were some good performance results, but also pathological cases “where lifetimes were not nested or where higher-order functions were used extensively”, sometimes requiring them to be altered to be “region friendly”, which was “time-consuming” and required knowledge of the inference algorithm. In addition, the regions changed unpredictably when the programs evolved, and memory leaks appeared when the compiler inferred too wide regions.
Haskell was (at the time) an experiment with lazy functional programming. Pervasive laziness prevents reasoning about the lifetime of values, and purity is a central assumption used by the compiler for program transformations, which is antithetical with reasoning about deallocation as an effect. It is well-known that naive Haskell code has issues with memory leaks, and that realistic Haskell programs have to follow “best practices” to avoid leaks, by making extensive use of strictness annotations (e.g. bang patterns). Unfortunately, I found it hard to find reliable academic sources about lessons drawn from the experiment like the RBMM retrospective. The best I could find on the topic of memory leaks is the following blog post: https://queue.acm.org/detail.cfm?id=2538488, from a Haskell programmer who wrote in another post (linked from that one) “My suspicion is that many (most?) large Haskell programs have space leaks, but they often go unnoticed”. This is consistent with comments I received from people with Haskell experience (first-hand, one academic and one industrial) and about an industrial Haskell consultant (second-hand) who reportedly commented that their main job was to fix memory leaks (but maybe in jest). Of course, take this with a grain of salt. At least, I believe that the Haskell academic community has accumulated empirical evidence of the extent and manner in which deallocation voids purity assumptions. Having an authoritative source about it would be pretty important to me, given the initial promises of functional programs being more tractable mathematically specifically via “referential transparency” and independence of execution order, whose theoretical justification already looks shaky to me from a semantic point of view. Some parts of the literature continues to promise far-reaching consequences of equational reasoning, without clear statements of limitation of the application domain. I have the impression that the Haskell which is practiced in the real world is very different from what you can read in some academic papers.
The hypothesis that deallocation matters as an effect, and that ML makes it easy to program and reason about effects, seems to me a strong argument explaining OCaml’s predictable and competitive performance.
So, thank you for your healthy scepticism.