Ocaml Bytecode performance?

It’s been a while, b/c I got distracted, but here’s some minor progress on understanding your performance problem.

repo.path: https://github.com/chetmurthy/sandbox-public/tree/master/csv-bench

There you’ll find your python code (py1.py), an ocaml version (caml1.ml) and then versions that just do the parsing-to-NeuronNode (py2.py, caml2.ml). Also, both bytecode and opt versions of the OCaml code. To build+run (make bench) you have to opam install pa_ppx_regexp, and that’ll pull in a ton of packages, so I ran it for you and attach the output below.

BTW, I originally wrote the code to use re, but then switched to pcre (it’s trivial b/c I’m using pa_ppx_regexp which does all the work behind the curtains of a PPX extension).

In short, the ocaml bytecode is about 100% slower than the Python code. The ocaml opt is a little bit faster than Python. Digging into the inner loop (parsing-to-NeuronNode), ocaml bytecode is maybe 50% slower than Python. Interestingly, with re (instead of pcre) the ocaml bytecode was 4x slower than python.

Anyway, it remains to look at the rest of the bits of the code and see where the rest of the time is going.

This isn’t terribly surprising though: both Python and Perl work hard to optimize string-handling, and for instance Perl has a ton of clever hacks to make common idioms in string-handling as close to hand-optimized C as possible. I wouldn’t be surprised to learn that Python does the same.

RUN -s OCaml make  bench
==== py1 ====
python py1.py -w 100 -c 10 -d 201 CAJAL/CAJAL/data/swc/320668879.swc
filename=CAJAL/CAJAL/data/swc/320668879.swc, verbose=False, warmup=10,  count=100

# entries:  3795
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.5597, 444.0379, 80.8959), radius=0.1373, parent_sample_number=200)
elapsed: 0.07812173804268241
==== caml1 ====
./caml1 -warmup 100 -count 10 -dump 201 CAJAL/CAJAL/data/swc/320668879.swc
filename=CAJAL/CAJAL/data/swc/320668879.swc, verbose=false, warmup=100,  count=10
# entries: 3795
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.559700, 444.037900, 80.895900), radius=0.137300, parent_sample_number=200)
elapsed: 0.124039
==== caml1.opt ====
./caml1.opt -warmup 100 -count 10 -dump 201 CAJAL/CAJAL/data/swc/320668879.swc
filename=CAJAL/CAJAL/data/swc/320668879.swc, verbose=false, warmup=100,  count=10
# entries: 3795
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.559700, 444.037900, 80.895900), radius=0.137300, parent_sample_number=200)
elapsed: 0.069903
==== py2 ====
python py2.py -w 100 -c 1000000 -d
verbose=False, warmup=1000000,  count=100

Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.5597, 444.0379, 80.8959), radius=0.1373, parent_sample_number=200)
elapsed: 1.685014556045644
==== caml2 ====
./caml2 -warmup 100 -count 1000000 -dump
verbose=false, warmup=100,  count=1000000
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.559700, 444.037900, 80.895900), radius=0.137300, parent_sample_number=200)
elapsed: 2.006684
==== caml2.opt ====
./caml2.opt -warmup 100 -count 1000000 -dump
verbose=false, warmup=100,  count=1000000
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.559700, 444.037900, 80.895900), radius=0.137300, parent_sample_number=200)
elapsed: 0.954097
==== py2 count=37950 ====
python py2.py -w 100 -c  37950 -d
verbose=False, warmup=37950,  count=100

Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.5597, 444.0379, 80.8959), radius=0.1373, parent_sample_number=200)
elapsed: 0.06500750803388655
==== caml2 count=37950 ====
./caml2 -warmup 100 -count 37950 -dump
verbose=false, warmup=100,  count=37950
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.559700, 444.037900, 80.895900), radius=0.137300, parent_sample_number=200)
elapsed: 0.079335
==== caml2.opt count=37950 ====
./caml2.opt -warmup 100 -count 37950 -dump
verbose=false, warmup=100,  count=37950
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.559700, 444.037900, 80.895900), radius=0.137300, parent_sample_number=200)
elapsed: 0.038155
2 Likes

Thanks, Chet, that’s really interesting. I appreciate you taking the time to look into it.
The Python code I ran was

filenames = ["/home/patn/recon/swc/"+filename for filename in os.listdir("/home/patn/recon/swc/")]
start =time.time()
ell = [read_swc_node_dict(f) for f in filenames]
stop = time.time()
print(stop-start)

I also tried using Jupyter Notebook’s %%time magic, which gave substantially similar results.
The ocaml program was

open Swc.Parse
open Swc.Batch

let dir = Sys.argv.(1)
let f neuron  =
  let x = Unix.gettimeofday () in
  let _ = NeuronTree.height neuron in
  let y = Unix.gettimeofday () in
  (y -. x)

let hashtbl_seq = map_over_dir Read_swc.seq_of_swc
        hash_of_seq dir
let () = Core.Sequence.iter hashtbl_seq ~f:(fun a ->
    let ell = Core.Hashtbl.keys a in
    Printf.printf "%d\n" (List.hd ell))

where the “map_over_dir” function applies Read_swc.seq_of_swc to each file to construct a sequence of nodes, and then applies hash_of_seq to construct the hash table from the sequence of nodes.

I found a 2008 paper (at this point pretty old and outdated) where some research scientists compare and contrast common languages in bioinformatics and it says
“Perl versus Python: Perl clearly outperformed Python for I/O operations. Perl was three times as fast as Python when reading a FASTA file and needed half of the space to store the sequences in memory. From the results of the global alignment and NJ programs Python appeared to have better character string manipulation capabilities than Perl.”

It seems like Python has upped their game since this paper was written!