Ocaml Bytecode performance?

I have three questions.

  1. How good is Ocaml’s bytecode performance compared to other interpreted languages, such as Java? What’s the right comparison/benchmark, like what are the closest neighbors to Ocaml’s bytecode interpreter in terms of performance? I would guess it’s faster than Python, Ruby and R but I would like to hear people confirm this.

  2. How would you contrast the performance of an Ocaml bytecode program with the native program? In general we should expect the native code to be faster, but how much faster? I am sure it depends on a lot of factors, what factors are most important? Where should we expect to get a big speedup/small speedup?

  3. I just did an experiment where I read a bunch of text files. The compiled version took 3.9 seconds and the bytecode version took 15.9 seconds. That catches me a little bit off guard. I somehow find it surprising that bytecode vs compiled would make a difference for the act of reading files. Intuitively, since the lion’s share of the time in the program is not spent computing but merely waiting on system resources, I would have expected this to not vary much. In both cases you tell the system you want to read lines from a text file, and you wait for the system to deliver the data from memory. Why is compiled vs interpreted a big deal in how long that request takes to fulfill?

3 Likes

For Q1 have I done a simple comparison with Python as follows. I read a comma-separated-value text file with hundreds to thousands of lines, using Core In_channel.read_lines where each line has a unique integer identifier and 8 fields. I build a record from each line, containing those 8 fields, and assemble them together into an Ocaml (Jane Street Core) Map or a Hashtbl/ a Python dict, respectively, where the integer identifier is the index for the record. I have tried using both Ocaml’s lists and (Core) Sequences to tweak the performance. No matter how I write the Ocaml code, the Python is substantially faster than the Ocaml bytecode.

  • Python: Wall time 10s, sys time 0.4s
  • Ocaml bytecode: Wall time 23s, sys time 1.2s
  • Ocaml native: Wall time 4.7s, sys time 0.040s

It is hard to believe that Python beats Ocaml bytecode in file IO and in sys time! I am happy to take suggestions about how I could rearchitecture the Ocaml program for improved performance. It seems like the mutable Ocaml hash table to Python dictionary comparison should be apples-to-apples, but I could be wrong about this.

1 Like

It depends a lot on your program. As a general rule, the difference will be lower for IO-bound programs and larger for CPU-bound programs. Speedups of ~2x to ~10x are typical, depending on the program payload.

I suggest you post the program you are benchmarking somewhere, otherwise it is impossible to explain the results you are seeing…

Cheers,
Nicolas

1 Like

For parsing csv files, the fastest approach in OCaml is likely to write a lexer and parser (using ocamllex and menhir).

I don’t know how your Python code is written, but if you’re relying on an external library it’s likely much faster than if you had to write it by hand (which is what you’re doing for OCaml).

OCaml bytecode is decent, but it’s interpreted. Python is also interpreted, but contains countless shortcuts to C to accelerate its performance. OCaml doesn’t do this. Java and Javascript use JITs so they will blow away OCaml’s performance.

1 Like

Hi Nicolas;
@nojb
Yes, of course. Here is the relevant source code. Some bits may be omitted because this is code from a larger project and I didn’t want to track down all the dependencies.

I am using this Python library to read the text files. It very well be built on C code.

I don’t think I’m using any other libraries worth mentioning. Only standard Python data structures such as dicts and classes.
Here is the format of the files I’m parsing, as I described above.
http://www.neuronland.org/NLMorphologyConverter/MorphologyFormats/SWC/Spec.html

2 Likes

As a warning, Jane Street’s libraries tend to be written with performance of the native compiler in mind. Your code ends up using a lot of abstractions that are (almost) free with the native compiler (with flambda) but expensive with bytecode.
Iterating over In_channel.input_line manually in a while loop will likely get you much better performance.

1 Like

My blind guess is that float parsing is very expensive in both languages, possibly slower in OCaml.

Your parse_line also seems unoptimal, I would try Scanf.sscanf str " %d %d %f %f %f %f %r " (fun buf -> Scanf.bscanf_opt buf "%d" (fun i -> i)) (fun id nodetype x y z r opid -> ... )

3 Likes

I do not know Core very well, but aren’t Map persistent maps based on AVLs (like those of the standard library ?).

If so you have to pay log to insert a binding while dictionaries in Python are hash tables.

You should switch to Hashtbl if that’s the case.

Also you seem to go through a lot of small allocations (using filter_map and such) allocating a lot of intermediary values. Over 100 000 lines these may start to trigger the garbage collector.

It also seems that the building of intermediary lists is unnecessary (and I’d be surprised if you built those in Python, did you write your whole Python code with list comprehension?).

1 Like

These are all good observations. I have tried a few permutations of this, using lists or Sequences, using mutable Hashtbls vs persistent Maps. These functions are listed For the reasons you pointed out, it seems like using only Sequences (except for the list of strings returned by In_channel.read_lines) and then incrementally building the Hashtbl one item at a time while enumerating through the final Sequence would be most efficient in terms of not introducing ancillary data structures/deforesting. This is at the end of the pastebin code. For bytecode the difference is small compared to the factor of 2 difference w/ Python.

There are no lists in the Python version, only iterators.

I’m not sure I follow the code but in the end, do you build a sequence of hashtables, one per file in a directory ? And you print the id (key) of one of the entries of each tables. If so, in your final function, getting the set of keys as a list, only to take the first element seems inefficient.

1 Like

Yes, indeed. The code is purely for benchmarking, printing the id to the console is only so I could watch the performance of the program in real time. There are other things that I need the whole set of keys for.

Can you share your Python code, and the input files (or at least, some of them)? And how you invoke (just for completeness)?

3 Likes

Hi Chet;
It’s from an existing Python library that I wrote. I am experimenting with porting some small fractions of it into Ocaml as a learning exercise.
I called the function read_swc_node_dict on each file.

This code should work on Python version >= 3.9.
Sample files are here https://github.com/CamaraLab/CAJAL/tree/main/CAJAL/data/swc

The code was run from a Jupyter notebook, using IPython.

I’ve updated your code with how I would implement it. I don’t have access to the interface of the Swc module so I’ve made guesses about its types.
I haven’t tested it yet so there might be a few things to tweak (maybe some more exceptions need to be caught, maybe the Scanf.bscanf line needs to be tweaked).
If you can get it to work, I’m curious to know what is its performance on your inputs.

module Read_swc : sig
  type swcline' =
  { id : int;
    nodetype : int;
    x : float;
    y : float;
    z : float;
    r : float;
    parent_id : int option
  }

  val read_file : string -> (int, swcline') Hashtbl.t
end =
struct
  type swcline' =
    { id : int;
      nodetype : int;
      x : float;
      y : float;
      z : float;
      r : float;
      parent_id : int option
    }

  let read_file filename =
    let tbl = Hashtbl.create 100 in
    let ic = Scanf.Scanning.open_in filename in
    try
      while true do
        (* Test for comment *)
        let is_comment = Scanf.bscanf ic "%0c" (function '#' -> true | _ -> false) in
        if is_comment then
          Scanf.bscanf ic "#%s@\n" (fun _line -> ())
        else
          Scanf.bscanf ic " %d %d %f %f %f %f %d \n"
            (fun id nodetype x y z r parent ->
               let parent_id = if parent > 0 then Some parent else None in
               let node = { id; nodetype; x;y;z;r; parent_id } in
               Hashtbl.add tbl id node)
      done;
      assert false
    with End_of_file -> Scanf.Scanning.close_in ic; tbl
       | exn -> Scanf.Scanning.close_in ic; raise exn
end

open Swc.Parse
open Swc.Batch

let () = print_endline "Hello, World!"
let dir = Sys.argv.(1)


let hashtbl_seq = map_over_dir Read_swc.read_file
        Fun.id dir

let () = Core.Sequence.iter hashtbl_seq ~f:(fun a ->
    let ell = Hashtbl.to_seq_keys a in
    match Seq.uncons ell with
    | None -> failwith "Empty table"
    | Some (id, _) ->
      Printf.printf "%d\n" id)
1 Like

Is this a fair version of your Python code ? I’ve stripped out a bunch of stuff, to get a standalone test. I’d like to understand precisely what you ran in Python, before proceeding to the OCaml code.

file: py1.py

r"""
Definition of a NeuronNode, NeuronTree and SWCForest class for representing the internal contents \
of an \*.swc file. Basic functions for manipulating, examining, validating and \
filtering \*.swc files. A function for reading \*.swc files from memory.
"""
from __future__ import annotations

import argparse
import os
import sys
from dataclasses import dataclass
from typing import Callable, Iterator, Literal, Container, Optional

@dataclass
class NeuronNode:
    r"""
    A NeuronNode represents the contents of a single line in an \*.swc file.
    """
    sample_number: int
    structure_id: int
    coord_triple: tuple[float, float, float]
    radius: float
    parent_sample_number: int

    def is_soma_node(self) -> bool:
        return self.structure_id == 1



def read_swc_node_dict(file_path: str) -> dict[int, NeuronNode]:
    r"""
    Read the swc file at `file_path` and return a dictionary mapping sample numbers \
    to their associated nodes.

    :param file_path: A full path to an \*.swc file. \
    The only validation performed on the file's contents is to ensure that each line has \
    at least seven whitespace-separated strings.

    :return: A dictionary whose keys are sample numbers taken from \
    the first column of an SWC file and whose values are NeuronNodes.
    """
    nodes: dict[int, NeuronNode] = {}
    with open(file_path, "r") as file:
        for n, line in enumerate(file):
            if line[0] == "#" or len(line.strip()) < 2:
                continue
            row = line.strip().split()[0:7]
            if len(row) < 7:
                raise TypeError(
                    "Row "
                    + str(n)
                    + " in file "
                    + file_path
                    + " has fewer than seven whitespace-separated strings."
                )
            nodes[int(row[0])] = NeuronNode(
                sample_number=int(row[0]),
                structure_id=int(row[1]),
                coord_triple=(float(row[2]), float(row[3]), float(row[4])),
                radius=float(row[5]),
                parent_sample_number=int(row[6]),
            )
    return nodes

parser = argparse.ArgumentParser(
                    prog='Py1',
                    description='What the program does',
                    epilog='Text at the bottom of help')

parser.add_argument('filename')           # positional argument
parser.add_argument('-v', '--verbose',
                    action='store_true')  # on/off flag

args = parser.parse_args()
print(args.filename, args.verbose)

d = read_swc_node_dict(args.filename)
print(len(d))

and I ran it thus:

$ python ./py1.py CAJAL/CAJAL/data/swc/354833767.swc
CAJAL/CAJAL/data/swc/354833767.swc False
2101

Yes, I think that’s correct.

Can you tell me what was the input that produced these timings?