Security issues to look out for in OCaml

Continuing the discussion from Request for comments: What to do with opam packages that have known security vulnerabilities:

That is mostly true, but it is worth keeping in mind that security issues can be present in any language and aren’t limited to memory-safety.

Trying to summarize the security issues that I’ve discovered or help fix:

  • security properties don’t compose: module A and B can be secure on their own, but you still need to test/fuzz their interaction.
  • look at security bugs fixed in other ecosystems. Chances are the OCaml implementation fell into the same trap as other implementations.

Security issues and suggestions

The low number of security issues shouldn’t make us over-confident, I know I probably was more confident than I should’ve been. Yes, we do have high quality libraries and projects, but we should be careful about drawing conclusions based on 0 (or low) security issues.

I’ll show below some examples of security issues in a project I worked on. IMHO it is useful to see what security issues are possible in OCaml to avoid repeating these mistakes.

Yes, most security bugs in OCaml are deterministic. Although that doesn’t mean the process is quick, because you may stumble upon more bugs while reviewing exposure, or testing a fix.
It also means that when you do have a bug, exploiting it is a lot easier.

This can be especially true for a project that never had a security audit done, or no prior attempt at fuzzing. I discovered ~6 more security issues when working on XSA-115, iterating through finding and fixing new issues. In this case there were actually 2 implementations of the same protocol: one in C and one in OCaml. Unfortunately sufficient high severity security issues (complete VM guest to host compromise) were discovered in both implementations, that switching to the other implementation wouldn’t have protected you.

Why do we have fewer security issues?

It is easy to mistake these 5 situations, but you are only secure when you are in the first:

  • the library / application really has no security issues, and even if you look very hard you won’t find any
  • the library / application currently has no security issues, but if you look (hard enough) you’ll find some
  • there is a known way to design the architecture of the library / application such that a certain category of security issues are avoided (e.g. using the type system to guarantee some invariants). But that doesn’t necessarily mean that architecture was actually used in all the libraries you use.
  • there are well known and fixed security issues for various file formats / tools. But that doesn’t necessarily mean that an OCaml reimplementation didn’t fall into the same trap that the original authors of the C code did!
  • the OCaml Security Response Team and OCaml Security Advisory Database has only been ~recently established. There was no common place to report or check for vulnerabilities before, the way issues got reported, and fixed would’ve been different for each package.

So lets look at what security issues can be present in OCaml programs, using examples of security issues that I discovered, was aware of, or worked on fixing myself in a project.
This list is by no means exhaustive.

More details about some of these are in this talk.

Conclusion

Don’t be over-confident about the security of an application based on the number of reported security issues, do your own testing (ideally contribute them to the upstream project too!).

Examples of security issues

Security is not always a composable property

Security is the property of the system as a whole, and even if you individually tested each library / module, you can still have security issues at the point where these modules / libraries interact.

  • node ownership can be changed by unprivileged clients

    • Users have quotas, and this is correctly implemented
    • There are user permission checks on all operations, and they are correctly implemented (after applying all other fixes)
    • Users can change the ownership of a node they’ve created, and because quota is tied to node ownership they have effectively gained infinite quotas by giving away their nodes to another user
  • Xenstore: Cooperating guests can create arbitrary numbers of nodes

    • this is again due to quota being tracked based on ownership instead of node creator
  • UTF-8 string handling, mix of Unicode 3.0 encoder with 3.1 decoder

    • individually each module was correct: it was able to save and load its own data using the Unicode version it has implemented (and implementing an older version of Unicode than it documented would’ve been merely a documentation bug, and not a security bug)
    • Unicode 3.1 is meant to fix some potential security issues with applications that handle Unicode 3.0, but it actually introduces a security issue if you mix it with any Unicode 3.0 encoder in the same application, because it will no longer be able to decode all files encoded by the former. If that file is used on startup, it can prevent the application from starting up, resulting in a persistent DoS attack.
    • If all modules use Unicode 3.0, or all modules use Unicode 3.1+ UTF-8 encoding then there are no security issues
    • The fix here is to ensure that all UTF-8 encoders/decoders use a common implementation (ideally the one in the standard library, or uutf). The implementation predates both of these by several years though.
  • XAPI: guest triggered excessive memory usage

    • there are various quotas, but they interact in a bad way because the quotas are based on number of items, not on memory used. A better way to express quotas would’ve been to account for total memory used (which would’ve been a composable property).

Conclusion

Testing / fuzzing the communication between modules / libraries / components / applications is still needed and very useful for OCaml too. Don’t skip this step when designing a secure library or application.

Space leaks

Memory leaks are still possible if you have long-lived values ~matching the lifetime of your program, such as global caches.

Lack of destructors

  • memory leak in reset watches
    • entries are added to a global cache when watches are created for a connection, but one cleanup path was not removing it from the global cache. A language that had destructors might’ve forced the programmer to think about what the resource lifetimes are. In OCaml this is more ad-hoc, and although a similar solution to destructors can be achieved (deleting from global cache when the entry itself is deleted) that is not how it was implemented, and there were 2 separate delete calls.
  • oxenstored keeps quota related use counts across domain destruction
    • several years later a similar bug as above in a different place

Conclusion

The presence of a garbage collector, and memory safety doesn’t mean you shouldn’t be thinking about resource usage, or memory consumption, especially if you handle untrusted input of arbitrary size.

Well known security issues that have been fixed elsewhere

  • XAPI HTTP directory traversal
    • “classic” HTTP server bug
  • XAPI open file limit DoS
    • another “classic” bug
  • Multiple RBAC issues in XAPI
    • on Linux you’d have SELinux, where you can describe the desired security properties more directly based on data flows and transitions, instead of ad-hoc permission checks. Although implementing something similar is also considerably more complex, so it wasn’t done here.

Conclusion

Test for common/known security issues that have been fixed in other implementations. Fuzzing can also uncover bugs (although you’ll need to define the desired security properties first)

Optimizations that defeat security properties

Maybe a case of premature optimization sneaking in a security vulnerability:

  • oxenstored: permissions not checked on tree root node
    • an optimization (deleting the root means replacing with []), but the ACL checks were missing. A better architecture to separate ACL checks from execution is known (Your Server as a Function), but the implementation predates it by ~4 years.
    • deleting the root node is not a common operation (in fact there is no legitimate reason for attempting to do it), so optimizing for it wasn’t worthwhile in the first place
    • No unsafe code involved here, and yet the result is a full guest-to-host-root privilege escalation
    • issue discovered while writing a fuzzer for another security issue

Conclusion

Before optimizing code try to have a clear separation of security sensitive code.

24 Likes