Examples of interesting interactions of mutable and pure code

Recently, I was looking through the implementation of various functions in the popular OCaml stdlib replacement libraries, and found this rather interesting implementation of to_array for Seq in Containers:

let to_array l =
  let len, ls = fold_left (fun (i, acc) x -> (i + 1, x :: acc)) (0, []) l in
  match ls with
    | [] -> [||]
    | init::rest ->
      let a = Array.make len init in
      let idx = len - 2 in
      ignore (List.fold_left (fun i x -> a.(i) <- x; i - 1) idx rest : int);
      a

In particular, the part that caught my attention was the use of List.fold_left (what I would typically consider a “pure” higher order function), with a mutable array assignment to insert elements into the allocated array.

I found that reasoning about the correctness of this kind of code to be quite interesting so I’ve been using sherlocode to try and find other such examples of these kind of interactions, but thought I might also ask here as well:

Does anyone have any other interesting examples of these kinds of code patterns?

1 Like

This example is pretty anodyne: you’re initializing an intrinsically mutable data-structure, and using recursion combinators to do it instead of for-loops. That’s pretty commonplace.

What’s more interesting, is when you use mutable data to solve a problem that could be solved purely-functionally. For instance, the classic (as in: the one we are (or were in the 80s) all taught in first-year PL class) unification algorithm is like that. I’m sure there are a ton more, but this one jumps to mind instantly. There are a ton of examples of externally-functional algorithms that use mutable data internally for performance reasons.

To be clear, I wasn’t trying to say that this was unorthodox or uncommon - rather I found it interesting because it was prototypical OCaml code - an idiomatic mix of functional and imperative code - that was also non-trivial to reason about, because the correctness of the loop inherently relies on mutable state (in particular, the loop invariant for the fold has to reason about the mutable contents of the array - at each iteration, the array shares a suffix of length len - i - 1 with the logical sequence).

But that’s a good point, it might be useful to focus my search on mutable data-structures might be good, but I’m also interested to see if these kind of patterns occur outside of purely linear sequences (I wonder if reasoning about mutable trees and such would lead to more interesting loop invariants).

Hardcaml inherently tries to be functional, but really isn’t.

It has wires which can be imperatively assigned to create cycles in its graph.

But more importantly (and pervasively) every node created in Hardcaml has a unique id. This makes total sense for what we want it to do but makes various traversal algorithms more complicated.

Something that comes up all the time with hashconsed expressions (very rough example):

type expr = {
  id: int;
  view: view
}
and view =
  | App of expr * expr
  | Var of string
  | List of expr list
  | Plus of expr * expr

let make (view:view) : expr = … (* hashconsing *)

(* keyed on expr.id and ==, for O(1) access *)
module Tbl : Hashtbl.S with type key = expr = …

let map_dag (f:expr -> expr) (e:expr) : expr =
  let tbl = Tbl.create 32 in
  let rec loop e =
    match Tbl.find_opt tbl e with
    | Some e' -> e'
    | None ->
      let e' =
        make @@ match e.view with
        | App (f,a) -> App (loop f, loop a)
        | Var _ -> e
        | List l -> List (List.map loop l)
        | Plus (a,b) -> Plus (loop a, loop b)
      in
      let e'' = f e' in
      Tbl.add tbl e e'';
      e''
   in loop e

This particular form of code is absolutely crucial if you deal with expressions that have a lot of sharing (typically in automated theorem proving) and makes the difference between “fast” and “too slow to wait for it to terminate”.

2 Likes

@c-cube points out my mistake!

Now that I look closer at that code, I’m curious why it was written the way it is. It initializes the array in three steps:

  1. reverse the list, compute length
  2. create array, init with head of that reversed list
  3. copy values from rest of reversed list, to array, in (of course) reverse-index-order.

So it’s … n (3-word) blocks of memory allocated-and-discarded in order to initialize one n+1-word array (one word of heap-maintenance information in each block. I wonder why that’s the most intuitive way to do it.

Huh. Thank you for pointing this out!

BTW, are you aware of Chris Okasaki’s "Pure functional Datastructures’ book (and thesis) ? I bet you’d find lots of amusing purely-functional code in there. Which goes in the opposite direction of where you want to explore.

Also, haha, I assume you’re aware of this: Functional Programming Koans, in OCaml by Doug Bagley — Translated by humans

The Koan of Side Effects

A student of FP came to Daniel de Rauglaudre and asked how to achieve FP Nature. Daniel replied, “to achieve FP Nature, you must learn to program without side effects and other imperative features”. So the student went away to learn how to program in this style. He studied very hard so he could rid his programs of references and for-loops. He struggled to only use let bindings and let rec loops. One day, in order to find inspiration, he was studying the code of the Masters, and in reading the source of Camlp4, he saw a number of uses of references and for-loops! He rushed back to Daniel de Rauglaudre and exclaimed, “How can you tell me not to use imperative features when you use them yourself!” Daniel measured the student carefully in his gaze and replied, “But I already know how to program without side-effects.” At that moment the student was enlightened.

ETA: Maybe even worse than 3*n if that pair in the first fold-recursion isn’t eliminated.

3 Likes

I just pushed a commit that improves this code, actually :slight_smile:. I use a ref to compute the length, so the only intermediate data is now the (reversed) list of items, not a list of tuples.

Note that the function takes as input an iterator, and attempts to traverse it only once in order to turn it into an array (in case it’s transient). Pre-computing the length is at odds with that. An alternative would be to use a Vector-like structure, of course, but that will still allocate more than n words, especially if you include the array extraction at the end.

3 Likes

Oh, I didn’t notice the “for Seq” in @Gopiandcode 's original post! Right, now I see why they did it this way. How could this be made more memory-efficient ? How about

  1. copying the seq into a list of arrays, each 1.X the length (where 1.X < 2) of the previous. The list is in reverse-order (a stack) but each array is in normal order.
  2. so then you reverse the stack and walk down it, copying entries to your final array (whose size you computed in step

I don’t know how much intermediate data would be consed, but it seems like you can control that by setting the expansion factor 1.X sufficiently low, and it always beats reversing the seq by consing onto a list.

I wonder if one can get better than that …

1 Like

Now I’m looking back at Seq, and it seems like the difference between Seq and Stream is a great example of where internally imperative code presents what is meant to be used in a purely-functional way. The internals of Stream are … wow, that’s complicated. But externally, with Stream expressions and parsers, it yields something that feels exceedingly functional.

Ah, well.

Ah, thanks for the reference to HardCaml, yep, I guess graph algorithms would also be a good example of this kind of functional + mutable code. When I have the time, I’ll try and skim through it to see if I can find some particularly interesting snippets.

Good point - I guess memoization is also a good source of examples where people use mutability alongside pure code, although in some way for me it feels less interesting, because the loop invariants will always be fairly simple.

Thanks @c-cube for the prompt update, that’s great - now I have yet another piece of code that makes interesting use of mutable references and interesting loop invariants.

Off the top of your head, do you know any other parts of containers/or your other libraries, that might have code with similarly interesting loop invariants. Anything with trees or structures other than sequences/lists?

Yup, I had initially thought of Chris Okasaki’s book, but since I was looking for impure code, I had to abandon that direction.

Thanks @Chet_Murthy , this is great, exactly the kind of interesting code I was looking for - I think the loop invariant in this case would be rather interesting.

You wouldn’t happen to have any other such examples off the top of your head would you? :slightly_smiling_face:

Maybe take a look at the internals of the Stream module (camlp-streams these days). It’s … a little frightening in there, but I think a good example of what you might be looking for.

1 Like

For trees, I’m afraid that I tend to use CPS rather than loops if they’re too big for direct recursion. :frowning:

Finally got round to actually working out a nice encoding of your proposed implementation of the to_array function:

let to_array (s: 'a Seq.t) : 'a array =
  match s () with
  | Nil -> [| |]
  | Cons (hd, tl) ->
    let batch = Array.make 1 hd in
    let (count, batches, last_batch, ind) =
      Seq.fold_left (fun (count, batches, curr_batch, ind) v ->
        let len = Array.length curr_batch in
        if ind = len
        then let new_batch = Array.make (len * 2) v in
          (count + 1, curr_batch :: batches, new_batch, 1)
        else (curr_batch.(ind) <- v; (count + 1, batches, curr_batch, ind + 1))
      ) (1, [], batch, 1) tl in
    let last_batch = Array.sub last_batch 0 ind in
    let batches = last_batch :: batches in
    let res = Array.make count hd in
    let _ =
      List.fold_left (fun count arr ->
        let len = Array.length arr in
        let dst_pos = count - len in
        Array.blit arr 0 res dst_pos len;
        dst_pos
      ) count batches in
    res

(This one doubles the batch size each time it is exceeded, but it’s a trivial matter to make it more conservative)