How good is Ocaml’s bytecode performance compared to other interpreted languages, such as Java? What’s the right comparison/benchmark, like what are the closest neighbors to Ocaml’s bytecode interpreter in terms of performance? I would guess it’s faster than Python, Ruby and R but I would like to hear people confirm this.
How would you contrast the performance of an Ocaml bytecode program with the native program? In general we should expect the native code to be faster, but how much faster? I am sure it depends on a lot of factors, what factors are most important? Where should we expect to get a big speedup/small speedup?
I just did an experiment where I read a bunch of text files. The compiled version took 3.9 seconds and the bytecode version took 15.9 seconds. That catches me a little bit off guard. I somehow find it surprising that bytecode vs compiled would make a difference for the act of reading files. Intuitively, since the lion’s share of the time in the program is not spent computing but merely waiting on system resources, I would have expected this to not vary much. In both cases you tell the system you want to read lines from a text file, and you wait for the system to deliver the data from memory. Why is compiled vs interpreted a big deal in how long that request takes to fulfill?
For Q1 have I done a simple comparison with Python as follows. I read a comma-separated-value text file with hundreds to thousands of lines, using Core In_channel.read_lines where each line has a unique integer identifier and 8 fields. I build a record from each line, containing those 8 fields, and assemble them together into an Ocaml (Jane Street Core) Map or a Hashtbl/ a Python dict, respectively, where the integer identifier is the index for the record. I have tried using both Ocaml’s lists and (Core) Sequences to tweak the performance. No matter how I write the Ocaml code, the Python is substantially faster than the Ocaml bytecode.
Python: Wall time 10s, sys time 0.4s
Ocaml bytecode: Wall time 23s, sys time 1.2s
Ocaml native: Wall time 4.7s, sys time 0.040s
It is hard to believe that Python beats Ocaml bytecode in file IO and in sys time! I am happy to take suggestions about how I could rearchitecture the Ocaml program for improved performance. It seems like the mutable Ocaml hash table to Python dictionary comparison should be apples-to-apples, but I could be wrong about this.
It depends a lot on your program. As a general rule, the difference will be lower for IO-bound programs and larger for CPU-bound programs. Speedups of ~2x to ~10x are typical, depending on the program payload.
I suggest you post the program you are benchmarking somewhere, otherwise it is impossible to explain the results you are seeing…
For parsing csv files, the fastest approach in OCaml is likely to write a lexer and parser (using ocamllex and menhir).
I don’t know how your Python code is written, but if you’re relying on an external library it’s likely much faster than if you had to write it by hand (which is what you’re doing for OCaml).
OCaml bytecode is decent, but it’s interpreted. Python is also interpreted, but contains countless shortcuts to C to accelerate its performance. OCaml doesn’t do this. Java and Javascript use JITs so they will blow away OCaml’s performance.
Hi Nicolas; @nojb
Yes, of course. Here is the relevant source code. Some bits may be omitted because this is code from a larger project and I didn’t want to track down all the dependencies.
As a warning, Jane Street’s libraries tend to be written with performance of the native compiler in mind. Your code ends up using a lot of abstractions that are (almost) free with the native compiler (with flambda) but expensive with bytecode.
Iterating over In_channel.input_line manually in a while loop will likely get you much better performance.
My blind guess is that float parsing is very expensive in both languages, possibly slower in OCaml.
Your parse_line also seems unoptimal, I would try Scanf.sscanf str " %d %d %f %f %f %f %r " (fun buf -> Scanf.bscanf_opt buf "%d" (fun i -> i)) (fun id nodetype x y z r opid -> ... )
I do not know Core very well, but aren’t Map persistent maps based on AVLs (like those of the standard library ?).
If so you have to pay log to insert a binding while dictionaries in Python are hash tables.
You should switch to Hashtbl if that’s the case.
Also you seem to go through a lot of small allocations (using filter_map and such) allocating a lot of intermediary values. Over 100 000 lines these may start to trigger the garbage collector.
It also seems that the building of intermediary lists is unnecessary (and I’d be surprised if you built those in Python, did you write your whole Python code with list comprehension?).
These are all good observations. I have tried a few permutations of this, using lists or Sequences, using mutable Hashtbls vs persistent Maps. These functions are listed For the reasons you pointed out, it seems like using only Sequences (except for the list of strings returned by In_channel.read_lines) and then incrementally building the Hashtbl one item at a time while enumerating through the final Sequence would be most efficient in terms of not introducing ancillary data structures/deforesting. This is at the end of the pastebin code. For bytecode the difference is small compared to the factor of 2 difference w/ Python.
There are no lists in the Python version, only iterators.
I’m not sure I follow the code but in the end, do you build a sequence of hashtables, one per file in a directory ? And you print the id (key) of one of the entries of each tables. If so, in your final function, getting the set of keys as a list, only to take the first element seems inefficient.
Yes, indeed. The code is purely for benchmarking, printing the id to the console is only so I could watch the performance of the program in real time. There are other things that I need the whole set of keys for.
Hi Chet;
It’s from an existing Python library that I wrote. I am experimenting with porting some small fractions of it into Ocaml as a learning exercise.
I called the function read_swc_node_dict on each file.
I’ve updated your code with how I would implement it. I don’t have access to the interface of the Swc module so I’ve made guesses about its types.
I haven’t tested it yet so there might be a few things to tweak (maybe some more exceptions need to be caught, maybe the Scanf.bscanf line needs to be tweaked).
If you can get it to work, I’m curious to know what is its performance on your inputs.
module Read_swc : sig
type swcline' =
{ id : int;
nodetype : int;
x : float;
y : float;
z : float;
r : float;
parent_id : int option
}
val read_file : string -> (int, swcline') Hashtbl.t
end =
struct
type swcline' =
{ id : int;
nodetype : int;
x : float;
y : float;
z : float;
r : float;
parent_id : int option
}
let read_file filename =
let tbl = Hashtbl.create 100 in
let ic = Scanf.Scanning.open_in filename in
try
while true do
(* Test for comment *)
let is_comment = Scanf.bscanf ic "%0c" (function '#' -> true | _ -> false) in
if is_comment then
Scanf.bscanf ic "#%s@\n" (fun _line -> ())
else
Scanf.bscanf ic " %d %d %f %f %f %f %d \n"
(fun id nodetype x y z r parent ->
let parent_id = if parent > 0 then Some parent else None in
let node = { id; nodetype; x;y;z;r; parent_id } in
Hashtbl.add tbl id node)
done;
assert false
with End_of_file -> Scanf.Scanning.close_in ic; tbl
| exn -> Scanf.Scanning.close_in ic; raise exn
end
open Swc.Parse
open Swc.Batch
let () = print_endline "Hello, World!"
let dir = Sys.argv.(1)
let hashtbl_seq = map_over_dir Read_swc.read_file
Fun.id dir
let () = Core.Sequence.iter hashtbl_seq ~f:(fun a ->
let ell = Hashtbl.to_seq_keys a in
match Seq.uncons ell with
| None -> failwith "Empty table"
| Some (id, _) ->
Printf.printf "%d\n" id)
Is this a fair version of your Python code ? I’ve stripped out a bunch of stuff, to get a standalone test. I’d like to understand precisely what you ran in Python, before proceeding to the OCaml code.
file: py1.py
r"""
Definition of a NeuronNode, NeuronTree and SWCForest class for representing the internal contents \
of an \*.swc file. Basic functions for manipulating, examining, validating and \
filtering \*.swc files. A function for reading \*.swc files from memory.
"""
from __future__ import annotations
import argparse
import os
import sys
from dataclasses import dataclass
from typing import Callable, Iterator, Literal, Container, Optional
@dataclass
class NeuronNode:
r"""
A NeuronNode represents the contents of a single line in an \*.swc file.
"""
sample_number: int
structure_id: int
coord_triple: tuple[float, float, float]
radius: float
parent_sample_number: int
def is_soma_node(self) -> bool:
return self.structure_id == 1
def read_swc_node_dict(file_path: str) -> dict[int, NeuronNode]:
r"""
Read the swc file at `file_path` and return a dictionary mapping sample numbers \
to their associated nodes.
:param file_path: A full path to an \*.swc file. \
The only validation performed on the file's contents is to ensure that each line has \
at least seven whitespace-separated strings.
:return: A dictionary whose keys are sample numbers taken from \
the first column of an SWC file and whose values are NeuronNodes.
"""
nodes: dict[int, NeuronNode] = {}
with open(file_path, "r") as file:
for n, line in enumerate(file):
if line[0] == "#" or len(line.strip()) < 2:
continue
row = line.strip().split()[0:7]
if len(row) < 7:
raise TypeError(
"Row "
+ str(n)
+ " in file "
+ file_path
+ " has fewer than seven whitespace-separated strings."
)
nodes[int(row[0])] = NeuronNode(
sample_number=int(row[0]),
structure_id=int(row[1]),
coord_triple=(float(row[2]), float(row[3]), float(row[4])),
radius=float(row[5]),
parent_sample_number=int(row[6]),
)
return nodes
parser = argparse.ArgumentParser(
prog='Py1',
description='What the program does',
epilog='Text at the bottom of help')
parser.add_argument('filename') # positional argument
parser.add_argument('-v', '--verbose',
action='store_true') # on/off flag
args = parser.parse_args()
print(args.filename, args.verbose)
d = read_swc_node_dict(args.filename)
print(len(d))