How can i locate the variable in an ocaml binary file



For the following code:

let kiukotsu = ref "abcdefghijklmnopqrstuvwxyz"
let cronus = ref "1234567890"

let () = 
    print_endline !kiukotsu;
    print_endline !cronus

i run the commond ocamlopt -o hello . And then i analysis the binary file according the following steps:
step 1: nm hello | grep caml_globals and i got the following message.

000000000063c548 D caml_globals
0000000000640320 B caml_globals_inited
000000000063c578 D caml_globals_map
0000000000640378 b caml_globals_scanned
0000000000423c80 T caml_globalsym

step 2: hd -s0x3c548 -n128 hello (the 0x600000 is vitual addr so i removed it) and i got this:

0003c548  c8 c9 63 00 00 00 00 00  38 dd 63 00 00 00 00 00  |..c.....8.c.....|
0003c558  d0 c8 63 00 00 00 00 00  38 c8 63 00 00 00 00 00  |..c.....8.c.....|
0003c568  00 00 00 00 00 00 00 00  fc 77 00 00 00 00 00 00  |.........w......|
0003c578  84 95 a6 be 00 00 00 d0  00 00 00 18 00 00 00 6e  |...............n|
0003c588  00 00 00 58 a0 c0 38 43  61 6d 6c 69 6e 74 65 72  |...X..8Camlinter|
0003c598  6e 61 6c 46 6f 72 6d 61  74 42 61 73 69 63 73 30  |nalFormatBasics0|
0003c5a8  7e a7 9e 60 8e 46 b4 1c  80 c3 25 17 73 e6 fd f3  |~..`.F....%.s...|
0003c5b8  30 c9 bf 2e 71 6f 3b ba  c0 7f 6d 2c c3 5b ab d2  |0...qo;...m,.[..|

step 3 hd -s0x3c8d0 -n128 hello and i got this:

0003c8d0  c0 c8 63 00 00 00 00 00  00 00 00 00 00 00 00 00  |..c.............|
0003c8e0  fc 13 00 00 00 00 00 00  61 62 63 64 65 66 67 68  |........abcdefgh|
0003c8f0  69 6a 6b 6c 6d 6e 6f 70  71 72 73 74 75 76 77 78  |ijklmnopqrstuvwx|
0003c900  79 7a 00 00 00 00 00 05  fc 0b 00 00 00 00 00 00  |yz..............|
0003c910  31 32 33 34 35 36 37 38  39 30 00 00 00 00 00 05  |1234567890......|
0003c920  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0003c930  03 00 00 00 00 00 00 00  c1 f7 40 00 00 00 00 00  |..........@.....|
0003c940  11 00 00 00 00 00 00 00  88 c9 63 00 00 00 00 00  |..........c.....|

it seems i got the variable’s locations in the binary file. But the address seems to have a little deviation with the actual address. SO How can i precisely locate the variables in the binary file?


It would be rather surprising if something that will get loaded at address X was found at location X in the file. I suggest spending some time learning about how the ELF object format (or the format for your machine, like mach-o or pecoff) works.


If you want to find the location of the strings in memory, you can first find the symbols to which this is assigned: ocamlopt -c -dcmm gives you (among other things)

 int 5116
 string "abcdefghijklmnopqrstuvwxyz"
 skip 5
 byte 5)
(data int 3068 "camlHello__2": string "1234567890" skip 5 byte 5)

This means that this is defined as symbols camlHello__1 and camlHello__2. Notice that this is the static constant, not the reference bound to the variable kiukotsu which is dynamically allocated and won’t appear in the binary file.


Thank for help! i have another question: What the meaning of int 5116 in the above message? :thinking:


i know it. i just want to know the variable’s offset in the section. So if i know the actual address of this section, i can calculate the actual address of variables. Thanks for you reply.


If you really want to know how to find the actual address, learn the format of the object files. That said, I don’t understand why you would want to do this unless you’re writing a linker or some such.


Yes, you guess right! i want to add some addition function to ocaml compiler, so i need to know how ocaml compiler manage the variables.


This isn’t the way to learn that at all. Instead, why don’t you get the latest OCaml compiler sources off github and read them? That might work a lot better.


If by variables in OCaml you mean references and other objects, then OCaml does not store them in the binary file, nor allocate them at any specific address or offset. Instead, values are created on the fly by the runtime in a minor heap (which is a memory usually allocated by malloc, mmap) and then they migrate, depending on their runtime through minor to major heap or destroyed by the garbage collector, if they are not referenced any more. In other words, it is very different from what you might think.


Yes, you are right and i am doing this right now. In fact, i am reading the source code and in the same time i want to learn the ocaml code from the binary perspective. so that i can deeply know what happened in the ocaml runtime.


I can foresee these. And it is really hard to do so. But i think we can locate the values in the memory via some techniques.(for instance, tag the variables in the memory or construct a hashmap just like a symtab in ELF file). As for whether those above will work well, i am not sure yet.


It’s likely to be as enlightening as studying how cars are designed by looking at the output of a car crusher.

I could see doing things that way if you were an archeologist and the only evidence you had of cars was the output of a car crusher in an ancient junkyard, but wouldn’t it be better if you have access to them to just look at the cars before they get destroyed, look at the designs, read the manuals, and talk to the engineers?

You even have access to the assembly code the compiler produces. Why, if you absolutely wanted to, wouldn’t you look at that instead of at the binaries? What possible motivation could you have for looking at binaries if you’re not doing reverse engineering of an unknown system?


In fact, i am a newbie in the ocaml field. Reading the source code is definitely the most effective way to know how ocaml compiler works. But it will cost several months for me. And i just want to do some experiments to know how to get the variables locations. i think through the experience of analysising the binary code, i can get an instinct of how ocaml compiler works and it will help me learn ocaml compiler code well. C language is the first and the only languages i have learned. And i’ve done lots of C code analysising before. So i can only use what i am familiar with to learn a totally new ocaml compiler. And i think you are right, i am gonna to read the source code. Maybe i will encounter a number of troubles. Anyway, it is a pleasant conversation with you and i think i’ve learned a lot. THANKS :blush: