OCaml on Cuda: Spoc, Sarek, Matrix Multiply?

I have recently been looking at GitHub - mathiasbourgoin/SPOC: Stream Processing with OCaml for OCaml GPGPU, and I have a few questions.

  1. Licensing. I’m not a lawyer, and this is not legal advice: Is that correct that Cecill == GPL and Cecill-B == BSD/MIT/APACHE ? This is confusing me as the web refers it to “Cecill family of licenses”, and in my mind, GPL/LGPL/AGPL does not belong in the same ‘family’ as MIT/BSD/APACHE.

  2. Naming. So in Cuda, we have two parts:

2.1. the *.cu files (actual cuda kernel), written in Cuda-C, compiled and run on the GPUs
2.2. some C/C++ wrapper that calls the Cuda drivers (on CPU side) to control stuff like: upload data to GPU, download data from GPU, call kernels on GPU variables, etc …

Is it correct that Sarek == OCaml DSL for writing cu files, and SPOC == doing the C/C++ driver side of uploading/downloading data / running kernels

  1. Assuming I’m right on #2, I’m looking at the Sarek demos at SPOC/SpocLibs/Benchmarks at master · mathiasbourgoin/SPOC · GitHub – and I’m not finding Matrix Multiply.

I am wondering if anyone has sample Sarek code for CUDA C++ Programming Guide

The reason I bring up this example is the following reason:

3.1 In Cuda sgemm, we do it in “tiles” and there is the issue of picking tile sizes. In C/C++/Cu land, this means a mess of keeping track of quite a few constants.

3.2 I’m wondering, in Spoc/Sarek, since the CU is generated at runtime, if there is an easier/cleaner way to handle this, i.e. both Spock/Sarek take the same tile_config as a parameter, and generates “tile size compatible” code for the Spock/Sarek interaction.

If I am understanding everything correctly, this could be a situation where Spock/Sarek has “cleaner” code because everything is in one language.

PS: If you have comments/insights on Spoc/Sarek, only tangentially related to the above, I’m interested in hearing them too.