Week 44: what's everyone hacking on this week?

Yes, the mechanism to compile cross arch will be the same. When cross compiling to a different arch you have an orthogonal problem of needing a retargetted OCaml compiler. whitequark has some repositories where he achieves this through some patches for example. But for something more polished, we’d probably need opam support for building retargetted compilers.

1 Like

I was away talking about generative testing, fuzzers, and CI at a local DevOps conference (where a surprising number of people were a little familiar with OCaml!) early this week, hence the late reply!

I’m using gdb for the first time in anger with OCaml, digging into a segmentation fault I discovered late last week which surfaces when running the ocaml-test-stdlib Crowbar tests with Crowbar’s latest version. My hands remember a bit of how to use gdb as a C REPL, but my brain seems not to be involved. Luckily for me I found this bit of explanatory ephemera on where function arguments go…

6 Likes

You might want to take a look at my GitHub - andersfugmann/aws-s3: Ocaml library to access Amazon S3 which implements basic functions on s3 such as get, put, ls and rm. Its loosly based on some examples in cohttp library, and uses async for concurrency.

1 Like

I’m trying to have the new Camelus working, with the help of @dinosaure. The point is to have it update a 2.0.0 branch of the repository automatically on every push (or PR merge) to master.

I also added package IDs in opam, and have a working prototype of binary caching (storing/restoring the installed tree of each package when built with the exact same dependencies and versions)

3 Likes

Apologies for the late reply, I had been in shanghai to attend SOSP where we presented Owl-related stuff to the system community.

Yes, Owl currently already has AD, on top of which we also build a quite comprehensive DNN module. One student in the lab built some cool example for image classification, you can try here: http://138.68.155.178/

Currently, I am working on OpenCL module. It hasn’t been merged into master branch but in case you are interested in, you can have a look at gpu branch in the github repo. Owl will first support matlab-like (or pytorch-like) gpu computing, i.e. precompile a set of frequently used functions (add/sub/mul/div/sin …), code generation requires metaprogramming and will be the next move. I am hoping by the end of this year, I can finish the alpha version and make some showcases.

5 Likes

Yep, I will start focusing on GPGPU code generation by the end of this year after Lazy and View modules are finished. We have four students working on owl-related stuff this term: 1) linear typing; 2) adaptive learning in owl+actor; 3) pure ocaml backend for deployment in browser; 4) zoo for service composibility … so hopefully things can progress faster :slight_smile:

4 Likes

Some naive questions

  1. How much the GPU branch is usable? It appears implement very primitive functions (sin, cos) but no matrix operations(?)
  2. Is it usable with OpenCL on Mac?
  3. not related to GPU, but README mentions it uses openBLAS. Can owl be used with Apple’s BLAS (I think it’s supposed to be faster on Mac)?

I use NVIDIA GPU for most GPU computations but if I can tap GPUs on Mac by owl then it would be very interesting.

1 Like

I’m working on small libraries that comprise a larger project for software defined overlays, that have a variety of the common distributed computing patterns.

IE Master-Slave replication, Process Groups, Service Registries, Deployment, Resource Monitoring, a bunch of routing and peer selelction strategies, etc.

Some are in ocaml others aren’t it’s kind of a clusterfuck of a project I’ve been working on in the background for a while I’m thinking about writing a bunch of lexers and parsers to generate finagle RPCs in scala from ocaml and one that converts finagle services and filters to appropriate LWT handlers.

1 Like

There is also a tensorflow-ocaml wrapper, although I never tried to write more or less complex things using it.
P.S. In any case, looking forward to seeing Owl dominance :slight_smile: .

2 Likes

Hacking on AD-OCaml today, an algorithmic differentiation framework for OCaml. Not sure it makes sense to give more details, since I don’t have any plans for the foreseeable future to release it, but here is some progress report anyway.

Since my last presentation at the Compose conference last year the framework has greatly matured and is probably stable enough for most non-life-critical production tasks. The most important differences to other frameworks:

  • “True” algorithmic differentiation, i.e. does not require explicit graph construction by the user as required by e.g. TensorFlow or Theano. Essentially all of OCaml is supported.
  • Full support for imperative operations on bigarrays (vectors, matrices), including many BLAS/LAPACK functions for in-place updates. Most AD tools or libraries either do not support imperative operations at all or have serious scalability issues when they do (that, sadly, also includes Owl).
  • Arbitrary nesting of derivative operators. This, again, is rarely supported, because it is quite intimately related to the previous item. Reverse-mode AD turns reads into writes, which either imposes serious restrictions on the functionality the framework can offer or infeasibly large performance penalties in the absence of imperative support. AD-OCaml can even calculate derivatives of itself - and of the derivative of the derivative of itself, etc. :slight_smile:
  • Full support for aliasing (think Mat.as_vec mat or Mat.col mat 42), preserving the semantics of imperative operations on them.
  • Quadratic instead of exponential time approximation of programs via UTPs (Univariate Taylor Polynomials = power series). UTP support requires hand-implementation of pretty horrendously complicated convolution algorithms.
  • Implicit task and data parallelism: all expensive purely functional operations are automatically parallelized, as are imperative operations on independent values. Imperative operations on the same value can be explicitly parallelized by the user via fork/join operators. This parallelism also propagates through derivative operators, i.e. gradient updates run in parallel, too.
  • Visualization of program traces as well as their AD-transformations via Graphviz.
  • Extensibility. You can add more operators without having to modify the library.

Besides the above features, the framework already supports the vast majority of float operations (including trigonometric, hyperbolic, etc.), also for tensors (vectors, matrices), as well as a decent chunk of BLAS/LAPACK functions (including some matrix factorization algorithms, which I’m incidentally working on today).

A large test suite covers all operations and many application corner cases. Especially verifying the consistency of UTP operations is extremely laborious and time consuming. Some of the more complex LAPACK functions can take more than an hour of CPU time to pass all tests!

Recent updates to Lacaml add better support for SIMD optimizations as well as for parallelizing operations and are exploited by AD-OCaml.

There is still quite some work to be done, but I think the framework could already realistically compete with (if not outcompete) most alternatives, also performance-wise, on multicore CPUs. Most existing frameworks perform especially atrociously with individual floating point operations. I have seen differences of 3-4 orders of magnitude for individual FLOPs with TensorFlow / Python (the OCaml-bindings crashed, which seems to be a problem with TensorFlow not the bindings :wink: ).

GPU support is not yet in the works, but it wouldn’t be that hard to do as the important AD functionality has already been functorized over arbitrary types. E.g. vectors and matrix operations, which can take different numbers and kinds of arguments, use the same functor to instantiate AD-functionality. GPU values would work the same, i.e. reverse mode AD and UTP algorithms would come “for free” as soon as you provide a suitable set of GPU base functions.

9 Likes
  1. I don’t recommend to use gpu branch at this point since we are shaping the APIs. Many kernels are missing, e.g. matrix multiplication as you have noticed. When the architecture is solid enough, adding more kernels won’t be a problem. The current plan is releasing a usable alpha version of next year.
  2. Yes, it is, should be compatible with opencl 1.2
  3. That should be possible as long as the api compatible with https://github.com/ryanrhymes/owl/blob/master/src/owl/cblas.h it just need to be linked to different library imo.

:slight_smile: yep, we are working quite hard on this to provide a set of convenient interface to DNN module. Owl’s DNN APIs are already quite easy to use, for simple example you can check: https://github.com/ryanrhymes/owl/blob/master/examples/cifar10_vgg.ml

For more complicated one like Google Inception V3 using Owl, and here is an online demo with code: http://138.68.155.178

4 Likes