It’s been a while, b/c I got distracted, but here’s some minor progress on understanding your performance problem.
repo.path: https://github.com/chetmurthy/sandbox-public/tree/master/csv-bench
There you’ll find your python code (py1.py
), an ocaml version (caml1.ml
) and then versions that just do the parsing-to-NeuronNode (py2.py
, caml2.ml
). Also, both bytecode and opt versions of the OCaml code. To build+run (make bench
) you have to opam install pa_ppx_regexp
, and that’ll pull in a ton of packages, so I ran it for you and attach the output below.
BTW, I originally wrote the code to use re
, but then switched to pcre
(it’s trivial b/c I’m using pa_ppx_regexp
which does all the work behind the curtains of a PPX extension).
In short, the ocaml bytecode is about 100% slower than the Python code. The ocaml opt is a little bit faster than Python. Digging into the inner loop (parsing-to-NeuronNode), ocaml bytecode is maybe 50% slower than Python. Interestingly, with re
(instead of pcre
) the ocaml bytecode was 4x slower than python.
Anyway, it remains to look at the rest of the bits of the code and see where the rest of the time is going.
This isn’t terribly surprising though: both Python and Perl work hard to optimize string-handling, and for instance Perl has a ton of clever hacks to make common idioms in string-handling as close to hand-optimized C as possible. I wouldn’t be surprised to learn that Python does the same.
RUN -s OCaml make bench
==== py1 ====
python py1.py -w 100 -c 10 -d 201 CAJAL/CAJAL/data/swc/320668879.swc
filename=CAJAL/CAJAL/data/swc/320668879.swc, verbose=False, warmup=10, count=100
# entries: 3795
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.5597, 444.0379, 80.8959), radius=0.1373, parent_sample_number=200)
elapsed: 0.07812173804268241
==== caml1 ====
./caml1 -warmup 100 -count 10 -dump 201 CAJAL/CAJAL/data/swc/320668879.swc
filename=CAJAL/CAJAL/data/swc/320668879.swc, verbose=false, warmup=100, count=10
# entries: 3795
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.559700, 444.037900, 80.895900), radius=0.137300, parent_sample_number=200)
elapsed: 0.124039
==== caml1.opt ====
./caml1.opt -warmup 100 -count 10 -dump 201 CAJAL/CAJAL/data/swc/320668879.swc
filename=CAJAL/CAJAL/data/swc/320668879.swc, verbose=false, warmup=100, count=10
# entries: 3795
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.559700, 444.037900, 80.895900), radius=0.137300, parent_sample_number=200)
elapsed: 0.069903
==== py2 ====
python py2.py -w 100 -c 1000000 -d
verbose=False, warmup=1000000, count=100
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.5597, 444.0379, 80.8959), radius=0.1373, parent_sample_number=200)
elapsed: 1.685014556045644
==== caml2 ====
./caml2 -warmup 100 -count 1000000 -dump
verbose=false, warmup=100, count=1000000
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.559700, 444.037900, 80.895900), radius=0.137300, parent_sample_number=200)
elapsed: 2.006684
==== caml2.opt ====
./caml2.opt -warmup 100 -count 1000000 -dump
verbose=false, warmup=100, count=1000000
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.559700, 444.037900, 80.895900), radius=0.137300, parent_sample_number=200)
elapsed: 0.954097
==== py2 count=37950 ====
python py2.py -w 100 -c 37950 -d
verbose=False, warmup=37950, count=100
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.5597, 444.0379, 80.8959), radius=0.1373, parent_sample_number=200)
elapsed: 0.06500750803388655
==== caml2 count=37950 ====
./caml2 -warmup 100 -count 37950 -dump
verbose=false, warmup=100, count=37950
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.559700, 444.037900, 80.895900), radius=0.137300, parent_sample_number=200)
elapsed: 0.079335
==== caml2.opt count=37950 ====
./caml2.opt -warmup 100 -count 37950 -dump
verbose=false, warmup=100, count=37950
Dump: NeuronNode(sample_number=201, structure_id=3, coord_triple=(594.559700, 444.037900, 80.895900), radius=0.137300, parent_sample_number=200)
elapsed: 0.038155