[ANN] BAP 2.3.0 Release

We’re proud to release the next stable version of Carnegie Mellon University Binary Analysis Platform (BAP). The full list of changes can be found on the release page but the most interesting new features are highlighted below.

The Primus Lisp Frontend

Now BAP is able to understand not only binary programs but sources written in Primus Lisp. In case if you don’t know, Primus Lisp is our DSL for writing analysis and library stubs (e.g., to specify semantics of missing library functions). Now, it is possible to reify Primus Lisp programs into static representation. For example, we can translate the following Lisp program

;; file demo.lisp

(defun example1 (x)
  (set R0 1)
  (set R1 2)
  (set R3 (+ R1 R2 (* R1 R2 3)))
  (memory-write R4 (+ R3 R1))
  (if (> R0 (* R0 R0))
      (exec-addr 0xDEADBEEF)
    (set R0 (* R0 R2 R3))))

into the BIL (BAP Instruction Language) AST and then pretty print it,

$ bap show --primus-lisp-load=demo --target=armv7+le -obap:bil example1
example1:
"{
   R0 := 1
   R1 := 2
   R3 := R1 + R2 + R1 * R2 * 3
   mem := mem with [R4] <- low:8[R3 + R1]
   #1 := R0 * R0 < R0
   if (#1) {
     jmp 0xDEADBEEF
   }
   else {
     R0 := R0 * R2 * R3
   }
 }"

This new feature not only allows us to reify our Lisp stubs into static form but also enables the main killer feature. It is now possible to specify the semantics of machine instructions in Primus Lisp. This feature enables rapid development and experimentation with CPU semantics. And this brings us to the next new feature.

New Target: RISC-V

The first application of the Primus Lisp Frontend was writing the RISC-V semantics. It took me only one day to write the semantic of the minimal subset of RISC-V instruction. Well, partially it is because RISCV-V is truly RISC, like the add instruction just adds,

(defun ADDI (dst rm rn)
  (set$ dst (+ rm rn)))

New Target: ARMv8 (Aarch64)

The next target that we tried was Aarch64, the 64-bit ARM architecture. It was a little bit harder but still definitely more readable than the official ARM semantics.

Adds namespaces (packages) to Primus Lisp

Since now we have much more code in Primus Lisp we found ourselves struggling with name clashes. The Primus Lisp program model is a set of mututally recursive overloaded definitions, so naming things is crucial for us. Therefore we implemented namespaces (which are, following Common Lisp trandition, named packages). We ended up in a very Common Lisp look and fill but without inheriting CL problems, like the dependency on the order of inclusion and package redefinitions, and so on. Given our model, and that Primus Lisp features type inference and Haskell-style type classes for overloading, it wasn’t that easy to implement :slight_smile:

Adds the bap dependencies command

The command outputs program dependencies such as libraries and symbols. The information is collected recursively with various output options, including dependency graph, YAML, JSON, and SEXP.

Much like nm+ldd on steroids and cross-platform (works on PE/ELF/COFF, and on binaries that are not native to the host). So it could be quite useful even if you’re not doing program analysis, but just want to solve a nasty missing library feature or figure our what programs use what libraries, e.g.,

$ bap dependencies `which ping` --recursive --ldconfig -ograph | graph-easy --as boxart
                     ┌────────────────┐
                     │ libresolv.so.2 │ ──────────────────────────────────┐
                     └────────────────┘                                   │
                       ▲                                                  │
                       │                                                  │
                       │                                                  │
┌──────────────┐     ┌──────────────────────────┐     ┌────────────────┐  │
│ libidn.so.11 │ ◀── │           ping           │ ──▶ │ libnettle.so.6 │  │
└──────────────┘     └──────────────────────────┘     └────────────────┘  │
  │                    │                 │              │                 │
  │                    │                 │              │                 │
  │                    ▼                 │              │                 │
  │                  ┌────────────────┐  │              │                 │
  │                  │  libcap.so.2   │  │              │                 │
  │                  └────────────────┘  │              │                 │
  │                    │                 │              │                 │
  │                    │                 │              │                 │
  │                    ▼                 ▼              │                 │
  │                  ┌──────────────────────────┐       │                 │
  └────────────────▶ │        libc.so.6         │ ◀─────┘                 │
                     └──────────────────────────┘                         │
                       │                      ▲                           │
                       │                      └───────────────────────────┘
                       ▼
                     ┌────────────────┐
                     │ ld-linux.so.2  │
                     └────────────────┘

What’s Next?

We are working on decompilation and integrating with Ghidra, so in 2.4.0 you should expect that bap will output C code for binaries. But it is not all, we’re even working into turning BAP into a program analysis framework that enables analysis of source code programs. And even crazier, we’re working on adding compilation capabilities to BAP, i.e., an ability to compile/recompile the input sources. So soon BAP will outlive its name, or we will need to find a new interpretation for the BAP acronym, something like the Best Analysis Platform :wink:

We also plan to make BAP more available for non-seasoned OCaml users and want to push bap into mainstream Linux distributions and overall lower the entrance barrier. Of course, with the end goal to lure users into installing opam))

Questions and Suggestions

Please, do not hesitate to ask questions and provide your suggestions and, ideally, join our community. Even if you don’t plan to work on binary analysis, BAP offers lots of opportunities for writing your toy programs for learning the language, or maybe even student projects.

13 Likes

I wonder if it’s possible to write automatic converter from SLEIGH to this new Primus Lisp architecture descriptions. That would be an amazing project.

Back in time I suggested to use full-fledged Datalog as a SLEIGH language, since semantically they are close:

It would be hard, as Primus Lisp semantic definitions are written by hand using high-level features of the language such as macro and static functions. In fact, Primus Lisp parser is more like a super-compiler which partially evaluates the input and produces the final refinement of the program in the form of the core theory terms, which are of low-level.

I presume it would be possible to write the sleigh parser in OCaml and then a transpiler that will output Primus Lisp, but I believe it would be non-trivial and I honestly don’t see any reason to do this, is there any problems with sleigh?