We’re proud to release the next stable version of Carnegie Mellon University Binary Analysis Platform (BAP). The full list of changes can be found on the release page but the most interesting new features are highlighted below.
The Primus Lisp Frontend
Now BAP is able to understand not only binary programs but sources written in Primus Lisp. In case if you don’t know, Primus Lisp is our DSL for writing analysis and library stubs (e.g., to specify semantics of missing library functions). Now, it is possible to reify Primus Lisp programs into static representation. For example, we can translate the following Lisp program
;; file demo.lisp
(defun example1 (x)
(set R0 1)
(set R1 2)
(set R3 (+ R1 R2 (* R1 R2 3)))
(memory-write R4 (+ R3 R1))
(if (> R0 (* R0 R0))
(exec-addr 0xDEADBEEF)
(set R0 (* R0 R2 R3))))
into the BIL (BAP Instruction Language) AST and then pretty print it,
$ bap show --primus-lisp-load=demo --target=armv7+le -obap:bil example1
example1:
"{
R0 := 1
R1 := 2
R3 := R1 + R2 + R1 * R2 * 3
mem := mem with [R4] <- low:8[R3 + R1]
#1 := R0 * R0 < R0
if (#1) {
jmp 0xDEADBEEF
}
else {
R0 := R0 * R2 * R3
}
}"
This new feature not only allows us to reify our Lisp stubs into static form but also enables the main killer feature. It is now possible to specify the semantics of machine instructions in Primus Lisp. This feature enables rapid development and experimentation with CPU semantics. And this brings us to the next new feature.
New Target: RISC-V
The first application of the Primus Lisp Frontend was writing the RISC-V semantics. It took me only one day to write the semantic of the minimal subset of RISC-V instruction. Well, partially it is because RISCV-V is truly RISC, like the add
instruction just adds,
(defun ADDI (dst rm rn)
(set$ dst (+ rm rn)))
New Target: ARMv8 (Aarch64)
The next target that we tried was Aarch64, the 64-bit ARM architecture. It was a little bit harder but still definitely more readable than the official ARM semantics.
Adds namespaces (packages) to Primus Lisp
Since now we have much more code in Primus Lisp we found ourselves struggling with name clashes. The Primus Lisp program model is a set of mututally recursive overloaded definitions, so naming things is crucial for us. Therefore we implemented namespaces (which are, following Common Lisp trandition, named packages). We ended up in a very Common Lisp look and fill but without inheriting CL problems, like the dependency on the order of inclusion and package redefinitions, and so on. Given our model, and that Primus Lisp features type inference and Haskell-style type classes for overloading, it wasn’t that easy to implement
Adds the bap dependencies
command
The command outputs program dependencies such as libraries and symbols. The information is collected recursively with various output options, including dependency graph, YAML, JSON, and SEXP.
Much like nm
+ldd
on steroids and cross-platform (works on PE/ELF/COFF, and on binaries that are not native to the host). So it could be quite useful even if you’re not doing program analysis, but just want to solve a nasty missing library feature or figure our what programs use what libraries, e.g.,
$ bap dependencies `which ping` --recursive --ldconfig -ograph | graph-easy --as boxart
┌────────────────┐
│ libresolv.so.2 │ ──────────────────────────────────┐
└────────────────┘ │
▲ │
│ │
│ │
┌──────────────┐ ┌──────────────────────────┐ ┌────────────────┐ │
│ libidn.so.11 │ ◀── │ ping │ ──▶ │ libnettle.so.6 │ │
└──────────────┘ └──────────────────────────┘ └────────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ ▼ │ │ │
│ ┌────────────────┐ │ │ │
│ │ libcap.so.2 │ │ │ │
│ └────────────────┘ │ │ │
│ │ │ │ │
│ │ │ │ │
│ ▼ ▼ │ │
│ ┌──────────────────────────┐ │ │
└────────────────▶ │ libc.so.6 │ ◀─────┘ │
└──────────────────────────┘ │
│ ▲ │
│ └───────────────────────────┘
▼
┌────────────────┐
│ ld-linux.so.2 │
└────────────────┘
What’s Next?
We are working on decompilation and integrating with Ghidra, so in 2.4.0 you should expect that bap will output C code for binaries. But it is not all, we’re even working into turning BAP into a program analysis framework that enables analysis of source code programs. And even crazier, we’re working on adding compilation capabilities to BAP, i.e., an ability to compile/recompile the input sources. So soon BAP will outlive its name, or we will need to find a new interpretation for the BAP acronym, something like the Best Analysis Platform
We also plan to make BAP more available for non-seasoned OCaml users and want to push bap into mainstream Linux distributions and overall lower the entrance barrier. Of course, with the end goal to lure users into installing opam))
Questions and Suggestions
Please, do not hesitate to ask questions and provide your suggestions and, ideally, join our community. Even if you don’t plan to work on binary analysis, BAP offers lots of opportunities for writing your toy programs for learning the language, or maybe even student projects.