Support for clearing GC'ed memory?

lindig · December 29, 2022, 7:52pm

For security reasons it could be desirable that a value that is no longer in use would be overwritten or cleared in some other way. This way such a value would not show up in a process’ core dump, for example. In particular string values could benefit. Is something like this possible today and if not, how could compiler and runtime support such a feature?

gasche · December 29, 2022, 9:24pm

You can try to do this with finalizers (which are called when a value becomes dead). However, finalizers are not reliable in the sense that they may sometimes not be called – typically on program exit. I would not trust them to enforce critical security properties.

I think that your best bet is to try the secure values as resources whose lifetime you manage manually, just like file descriptors and other non-GCed resources. Scrub it explicitly, like you would explicitly close your file descriptors, for example by using Fun.protect (or other control-flow mechanism) and giving up on rich/complex lifecycle protocols for the resource. (This is something that languages with support for RAII-style arguably do better than OCaml.)

silene · December 30, 2022, 7:44am

It is a one-liner change. Indeed, the runtime already clears the memory during the sweep phase, but only in debug mode. So, it is just a matter of enabling it always.

But as @gasche wrote, building your security model on the assumption that the garbage collector might have noticed that a block is no longer alive gives a false sense of security. Just overwrite any sensitive block as soon as you know it is no longer needed (despite being still alive).

lindig · December 30, 2022, 8:32am

Strings are immutable. How would you scrub a string when you no longer need it? Obviously you want not to create a mutable copy which you then scrub but to be sure that you scrub the original value.

bluddy · December 30, 2022, 10:39am

Use Bytes if you need security.

hyphenrf · December 30, 2022, 2:27pm

It is an interesting (and I think unanticipated) use-case that often pops up for managed languages because of the nondeterministic nature of their memory management model. For example, .NET has a specific SecureString class that ensures data is handled deterministically, not copied, and scrubbed upon deletion… Similarly, Oracle’s Java offers GuardedString.

Interestingly, there is this article mentioning how this is generally a hard problem, even with security measures ensured. Also check this SE question which talks about the two classes mentioned above.

I don’t know what’s worse, having programmers roll their own security measures, or having said security measures by default, providing a false sense of security.
In all cases, scrubbing is definitely more of a guarantee than sitting there and hoping the freed string data is overwritten. Something like:

let unsafe_scrub str =
  Bytes.(unsafe_fill (unsafe_of_string str) 0 (String.length str) '\000')

Does not call GC. As long as you ensure this is the only copy of the string in memory, or ensure every copy finalizes with this, then things should be relatively more secure. In a very little sense of “more”.

The problem is that even this doesn’t protect against guessing the string length, and doesn’t scrub the string ASAP. For all intents and purposes it’s still right there. There should definitely be other measures to ensure data anonymization etc… gasche’s suggestion to treat it like a resource seems most useful in practice.

lindig · December 30, 2022, 7:41pm

This relies on the implementation detail that Bytes.unsafe_of_string does not create a copy of the string. I believe this use case would need a String.scrub function that acts on the otherwise immutable string to be sure. Otherwise there is no guarantee that a Bytes values derived from the string is not a copy. Likewise, if one is operating on Bytes because they can be scrubbed, any conversion to a string could create a copy, which then would need to be managed.

I suspect there could be more nuance to this in the general case beyond strings.

gasche · December 30, 2022, 8:30pm

String.scrub is a big no-no because it makes string mutable. If you want to have secret data and scrub it eventually, you should use bytes instead or an outside-the-heap representation (for example bigarrays). Never copy your secret around or convert it to uncontrolled datatypes. In practice you only need to provide your secret to cryptographic APIs anyway, I don’t know of use-cases that would involve a lot of direct manipulation of the secret value (outside the boundaries of the cryptographic layer).

edwin · December 30, 2022, 8:43pm

I don’t think bytes (even with finalizer) would work in this case because the GC might’ve moved the value (perhaps several times if the heap is compacted), and I’m not sure there is a guarantee that the old value would’ve been completely overwritten by other values: minor to major heap move would leave behind a copy in the minor heap, and during compaction in the major heap it is up to the libc allocator what it does with freed memory (it may keep it around and reuse it instead of immediately giving it back to the OS, depending how big it is).

A bigarray might work, and might be the preferable approach (it doesn’t rely on any GC or libc implementation details and gives you full control over scrubbing), but comes with the drawback that none of the string processing routines would work with it, and you might have to create temporary bytes that need scrubbing (although that can be avoided if everything using the secret is converted to use bigarrays, or the API of this SecureString module).

hyphenrf · December 31, 2022, 12:00am

Oooh that is a good point. What does “== has defined behaviour on mutable data” imply when we talk about arbitrary moves though? Is it simply that mutable data is movable but copies are explicit? (e.g. Bytes.copy, Array.copy, ref !r)

gasche · December 31, 2022, 7:44am

The GC may still implicitly copy values around as long as all OCaml-visible references point to the same copy (the others are dead). On the current OCaml 5 runtime there isn’t much of an issue (if you ensure that your values are big enough to be allocated on the major heap directly; there is no compaction on OCaml 5), but out-of-heap values provide you with even more control. (In practice I would expect the crypto API to handle secrets, and be written in another language and thus store its secrets out of heap by default anyway.)

haesbaert · December 31, 2022, 10:21am

You might be fine just using Bigarray as mentioned (Cstruct uses it so you can use that).
Depending on the case, you might still need a stub to make sure the C compiler doesn’t optimize memset/bzero away (see explicit_bzero(3)).

kayceesrk · December 31, 2022, 11:02am

I would like to understand the threat model for this feature request. Recently macOS made a change to zero out memory that is freed by default Michael Tsai - Blog - Zeroing Freed Memory. This makes sense for C programs as malloc returns uninitialised memory and hence, zeroing out on free protects against the program reading earlier writes. Since OCaml always returns initialised values (except those low-level C API functions which are explicitly documented), I wonder how much value zeroing out freed memory brings to OCaml.

Either way, it would be interesting to see how much overhead this adds to realistic workloads. This could possibly be implemented as a OCAMLRUNPARAM option.

Armael · December 31, 2022, 11:11am

Hmm, don’t Bytes.create and Bigarray.*.create allow you to observe uninitialised memory?

haesbaert · December 31, 2022, 12:01pm

It goes a bit back to the old discussion of “Is it worth protecting anything if an attacker can read your address space”. Nowadays I’d say yes, because it can help protecting against side channel attacks, the general consensus is that “it’s good practice to zerout sensitive data as soon as you can”.

The idea is to keep sensitive data like keys in memory only when you need it, the standard steps for lets say negotiating a SSH key could be:

read private key from disk into some buffer.
derive shared keys.
zero-out the buffer.

So it’s not about protecting uninitialized data, it’s about protecting initialized data in a sense.
It’s still tricky since you have to know how the IO layer is handled (you can’t use Input_channel or anything like it), you would do the manual Unix.read into a bigarray directly, and make sure the key never leaves there, which might also imply all your deriving algorithm needs to be very careful where it’s holding pieces of the key.

haesbaert · December 31, 2022, 12:17pm

I’d just add that I think zeroing free has a different effect, malloc will still return uniinitialized data in OS X on many cases since it might be a new, just mapped-in page.
Imho, the nicest thing about always zeroing out freed memory is that it becomes really easy to spot use-after-free bugs, they just blow up more consistenly instead of corrupting data.
In OpenBSD we had a similar scheme via malloc_opt (more sophisticated options) that could be set with an environment variable, so it was opt-in,

lindig · December 31, 2022, 2:07pm

I had no specific use case in mind. But security sensitive code increasingly pays attention to this and it seems to be problem for GC’ed languages that are otherwise eliminating notorious problems like buffer overruns. I am therefore interested how GC’ed languages could address this.

Maybe this needs runtime and type support: for complex values, a bit in the header similar to the color bits could indicate that the value should be zeroed by the GC once the value becomes unreachable or a copy is created. This would suggest this could be a property of all complex types and would raise the question of type/representation compatibility.

kayceesrk · December 31, 2022, 3:50pm

You are right. And I guess this Bytes and Bigarrays are indeed the types that you would want to zero out.

gadmm · January 1, 2023, 6:16pm

Typically you also want to prevent the memory to end up on disk using something like mlock or MAP_LOCKED. (See for instance valpackett/secstr: Rust data type suitable for storing sensitive information such as passwords and private keys in memory, featuring constant time equality, mlock and zeroing out. - secstr - Codeberg.org for a Rust equivalent.) In that case you need to allocate it out-of-heap and handle it as a resource as suggested by @gasche, e.g. with a bigarray-like interface but with a release function that zeros the memory.

There is more context on this use-case in the LWN article about MAP_EXCLUSIVE.

(edit: not an expert on security/cryptography/etc. If an expert wants to share a paper that advocates better practices, I am curious about it!)

(edit 2: fixed URL)

Topic		Replies	Views
Static lifetime Learning	21	3232	January 7, 2020
GC alive blocks clarification Learning	7	575	December 7, 2020
Add support for stack allocation Learning language-design	28	3605	January 11, 2021
Garbage Collection, Side-effects and Purity Learning	29	5412	January 9, 2020
Downsides to calling Gc.full_major at exit? Learning bindings	15	1689	April 25, 2020

Support for clearing GC'ed memory?

Related topics