Sparse linear solve, or: how to use C/Python/... code from OCaml

Hi, I need to efficiently solve a sparse linear system in OCaml. The state of the art libraries for doing this are UMFPACK/SuiteSparse and SuperLU. These backends are integrated in various numerical libraries including Eigen (C++) and SciPy (Python).

Any ideas what might be the fastest way to use these libraries from OCaml? I am aware of Owl, which includes bindings to Eigen but unfortunately does not seem to include support for sparse solvers at the moment. How difficult is it to interface with Python code? I would like to get this up and running as quickly as possible, and I am wondering what would be the path of least resistance.

Hi,

I’m not very experienced with OCaml but I’ve been binding parts of SuiteSparse in another language (in Rust), so maybe you’ll find my point of view useful.

From what I’ve seen, SuiteSparse is pretty easy to bind since it mostly requires simple types such as pointers to integers and pointers to floats. So you probably want to store your sparse matrix using some bigarrays and pass their pointers to SuiteSparse’s functions.

I’m not sure interfacing with python would be easy, it’s not easy to interface two garbage collected languages, and you would have to bind more or less the same functions.

2 Likes

We have an internal project using py.ml to bind to some NLP libraries and existing custom Python code. The trickiest part so far has been making the sure environment is setup properly so py.ml has access to the expected libraries (managed by Python’s virtualenv). py.ml itself has been fairly easy to work with after learning a bit about Python’s internal data representation.

2 Likes

Thanks for the replies! Given that I’m looking at sparse systems and there is hopefully not too much data to shuffle around, I am now thinking the simplest thing may be to run the solver in a separate process and use named pipes to send the data back and forth. I don’t have a good sense of how big the serialization overhead will be, but given that I’m talking about sparse matrices I have hope that it may be fairly small.

1 Like

Those tend to be messy on Unix. I recommend normal pipes or the socketpair(2) system call if you need bidirectionality.

If you are worried about serialization overhead, mmap a common structure between the two processes and use the sockets for communicating only that data is in a stable state for a reader.

Could you elaborate a bit on why you think named pipes may be messy?

I’m the OCaml newb working with hcarty on this Python + py.ml stuff. The complexity is in wrapping and unwrapping the pythonic variables as needed. Initializing objects is also fun. Just remember you don’t get any magic objects, only what you ‘let’ will exist for the scope you have it in.

1 Like

The API on the Unix side kind of sucks. I’ve gotten burned by them a few times. That said, if they do what you prefer, try them.

it might make sense to make use of the sparse matrix types in owl which are made to directly use Eigen sparse functions. (disclaimer: have not used them)

@n4323: Unfortunately bindings to Eigen’s sparse solver functions are currently missing :frowning:

@rseymour: What’s the difference between py.ml and lymp?

@smolkaj lymp spawns a separate Python process, serializing data back and forth between the OCaml parent process and the child Python process. py.ml loads/embeds the Python runtime into an OCaml process as a .so or platform equivalent.

1 Like

yes, i just meant, starting from the pre-defined sparse array types from owl, then wrapping the eigen sparse solvers yourself. this might save a little bit of work?

Oh, that’s a great idea actually!

or even a PR against owl? seems like they may be interested…

Yes, in case I get around to writing bindings I will definitely submit a PR. For now I found a short term solution via Python.

In case it helps, it looks like py.ml recently got support for zero-copy sharing between OCaml and Python using bigarrays + numpy. See the updated api docs here.

Just for the record in case anyone is curious: for my purposes, running Python in a seperate process and communicating via pipes did the trick. The only dependency is findlib to loacte the python script at runtime. Here is how it looks:

(* OCAML CODE THAT INVOKES PYTHON SCRIPT *)
(* setup external python script to solve linear system, start in seperate process *)
let pkg_name = "my_package" in
let script_name = "my_script.pyc" in
let pyscript = match Findlib.package_directory pkg_name with
  | dir ->
    dir ^ "/" ^ script_name
  | exception Findlib.No_such_package _ ->
    failwith ("missing ocamlfind dependency: " ^ pkg_name)
in
let cmd = "python3 " ^ pyscript in
let (from_py, to_py) = Unix.open_process cmd in
...

The python script reads its inputs from sys.stdin and writes the outputs to stdout using print. Here is the jbuild file for the python script:

(jbuild_version 1)

;; this precompiles the Python script to bytecode
(rule
 ((targets  (my_script.pyc))
  (deps     (my_script.py))
  (action   (run python3 -m compileall -b ${^}))
 )
)

;; this installs the compiled script in the library folder of "my_package"
;; there it can be located reliably at runtime using ocamlfind
(install
  ((section lib)
   (files   (my_script.pyc))
   (package my_package)
  ))

On the OCaml side, I use Jane Street’s stdio (included in core) for conveniently communicating with the script:

(* send line to python script *)
Out_channel.fprintf to_py "some number: %d\n" 5

(* receives lines in loop *)
begin try while true do
  let line = In_channel.input_line_exn from_py
  (* do something with the line *)
  ... 
done with
  | End_of_file -> ()
end;

Not too bad! This works reliably on Linux and MacOS.

Update: the full code is here https://github.com/smolkaj/ocaml-python

1 Like