I’m collecting cases where people modernized data representation either to store them more compactly or to store some extra data in unused bytes. If it is done using unsafe features of OCaml (or similar language runtime), then it is OK. I will state below a few examples that I found, and I hope that you can add some more.
In DSL for relational programming we have values such that every part could be replaced by a logic variable. For example, instead of ground lists
type 'a list =  | (::) of 'a * 'a listwe write everywhere types like:
type 'a logic = | Value of 'a | Var of int (* simplified *) type ('a, 'b) l =  | (::) of 'a * 'b type 'a logic_list = ('a, 'a logic_list) l logic
After that we reuse OCaml values representation as a tree and perform unification of OCaml trees (working with the Obj module is involved).
This representation has a performance drawback: all ground values are tagged with constructor Value and trees become twice large if we count pointers. To avoid the tagging we decided to have an abstract type for logic values that are represented in a more compact way: we store logic variables as before, but we don’t put Value constructors. After unification and substitution these unsafe values could be reified to typed representation above (‘a logic_list, for example)
Another example is from the paper Full reduction at full throttle. There we have a tagged data type which is either a closure or a pair of something.
type head = | Lam of t -> t | Accu of atom * t list
To avoid tagging authors decided to store values without constructors and discriminate two cases by constructor arguments: closures always have a tag Closure_tag and pairs have tag 0.
Previous two cases were about storing data in a more compact form. Here is an example how language runtime could have tagged pointers to store extra information.
Today pointers in our operation systems are byte-aligned, so we usually have three lower bits as zeros. We could reuse those bits to store some type information. OCaml runtime uses only the lowest bit to distinguish immediate values. Chez Scheme (BIBOP runtime), AFAIK, uses more bits to distinguish pairs from other values. As a consequence int pairs in Chez Scheme occupy only two words but in OCaml they occupy three. I think that this knowledge is widespread.
What is less widespread (meaning I was surprised when I read it) is that we could reuse a few bits in the middle of the words if we decide to have double-precision floats as immediate values. Today NaNs are distinguished from non-NaNs by largest 12 bits leaving us 52 bits for pointers. Due to the maximum available memory, 48 bits should be enough for pointers, so we have 4 extra bits for type information in the middle of the word.
I’m curious, which consequences will appear if we redesign OCaml runtime using BIBOP. But it’s probably a topic for another thread.