Code LLM finetuned for OCaml / people's experiences with LLM code completion

Hi!

Has anyone used GitHub Copilot or an equivalent Large Language Model autocompletion / programmer’s support tool with OCaml? Does it help, do the available tools even know OCaml?

Is anyone fine-tuning a Large Language Model on OCaml code?

Thanks!

Here’s a couple examples I tried with ChatGPT, copied below verbatim:

What’s a piece of OCaml code that illustrates the beauty of this language?

OCaml is a functional programming language that is known for its concise, expressive syntax and type system. Here is a short example that demonstrates some of the features that make OCaml a beautiful language:

(* define a recursive function to compute the factorial of a number *)
let rec factorial n =
  if n = 0 then
    1
  else
    n * factorial (n - 1)

(* define a function to compute the nth Fibonacci number *)
let rec fibonacci n =
  if n = 0 then
    0
  else if n = 1 then
    1
  else
    fibonacci (n - 1) + fibonacci (n - 2)

(* use pattern matching to define a function that returns the elements of a list in reverse order *)
let rec reverse = function
  | [] -> []
  | hd :: tl -> reverse tl @ [hd]

(* use currying to create a function that adds two numbers and a function that adds three numbers *)
let add x y = x + y
let add3 = add 3

(* use higher-order functions to create a function that applies a given function to each element of a list *)
let map f lst =
  let rec map_aux acc = function
    | [] -> List.rev acc
    | hd :: tl -> map_aux (f hd :: acc) tl
  in map_aux [] lst

(* define a polymorphic variant to represent the four suits of a deck of cards *)
type suit = Clubs | Diamonds | Hearts | Spades

(* use polymorphic variants and variant patterns to define a function that returns the name of a suit *)
let get_suit_name = function
  | Clubs -> "Clubs"
  | Diamonds -> "Diamonds"
  | Hearts -> "Hearts"
  | Spades -> "Spades"

(* use type inference to define a list of integers and a list of strings *)
let ints = [1; 2; 3; 4]
let strs = ["hello"; "world"]

(* use type annotations to specify the type of a function that converts a list of integers to a list of strings *)
let ints_to_strs (lst : int list) : string list =
  map string_of_int lst

(* use type inference to define a function that returns the length of the longest string in a list *)
let max_length lst =
  let lengths = map String.length lst in
  List.fold_left max 0 lengths

I hope this example gives you a sense of some of the features that make OCaml a powerful and expressive language!

Could you show some examples using JaneStreet libraries?

Certainly! Here are some examples using some popular libraries from Jane Street, which is a financial services company that has developed and open-sourced a number of libraries for OCaml:

(* use the Core library to define a function that returns the average of a list of floats *)
open Core

let avg lst =
  let sum = List.fold_left (+.) 0. lst in
  sum /. Float.of_int (List.length lst)

(* use the Core_kernel library to define a function that returns the maximum element of a list,
   using the Monad.Sequence.max_elt function *)
open Core_kernel

let max_elt lst =
  Monad.Sequence.max_elt lst ~cmp:Float.compare

(* use the Async library to define a function that asynchronously downloads the contents of a URL *)
open Async
open Async_http_client

let download_url (url : Uri.t) : (string, exn) Result.t Deferred.t =
  try_with (fun () ->
    Client.get url
    >>= fun (_, body) ->
    Body.to_string body
  )

(* use the Alcotest library to define a unit test that checks the output of the avg function *)
open Alcotest

let test_avg () =
  let lst = [1.; 2.; 3.; 4.] in
  check float "avg" 2.5 (avg lst)

(* define a test set that includes the test_avg test *)
let tests = [
  "avg", `Quick, test_avg;
]

(* run the tests *)
let () =
  run "test suite" tests

I hope these examples give you an idea of how you can use some of the libraries developed by Jane Street in your OCaml code!

How is the sequence datatype in JaneStreet libraries different from the sequence datatype in the OCaml standard library?

In the OCaml standard library, the Seq module provides functions for working with sequences, which are lazy lists that are defined by a recursive function. A sequence can be iterated over with a for loop, but it is not possible to access the elements of a sequence by index.

In contrast, the Sequence module in the Jane Street Core_kernel library represents sequences as arrays with constant-time random access and efficient slicing, concatenation, and reversal. This allows for more efficient manipulation of sequences, but it also means that sequences in the Core_kernel library are not lazy and do not support infinite sequences.

Here is an example that demonstrates the difference between the Seq module from the OCaml standard library and the Sequence module from the Core_kernel library:

Copy code

(* define an infinite sequence of integers using the Seq module from the OCaml standard library *)
let rec ints_seq n =
  Seq.cons n (ints_seq (n + 1))

(* use the hd function to get the first element of the sequence *)
let first = Seq.hd (ints_seq 1)

(* use the take function to get the first 10 elements of the sequence *)
let first_ten = Seq.take (ints_seq 1) 10

(* define a sequence of integers using the Sequence module from the Core_kernel library *)
let ints_seq = Sequence.init 10 ~f:(fun i -> i)

(* use the get function to get the third element of the sequence *)
let third = Sequence.get ints_seq 2

(* use the sub function to get the second through fourth elements of the sequence *)
let second_to_fourth = Sequence.sub ints_seq 1 3

I hope this example helps clarify the difference between the Seq and Sequence modules in OCaml!

ChatGPT Dec 15 Version. Free Research Preview. Our goal is to make AI systems more natural and safe to interact with. Your feedback will help us improve.