Report on first steps in OCaml

This is a report of the first steps on my journey to learning OCaml.

To give you some context, I am a relatively experienced programmer with very little free time, so I was looking for a crash-course, i.e. the quickest possible introduction to the language, so that I can get to work and learn as I implement my side-project.

I know there’s work being done on the documentation site, and that’s fantastic. This report dates back to a month ago, so it might be that some resources have been updated in the meantime. cc @professor.rose who asked me about my experience as a newcomer :slight_smile:

Anyway, here goes:

  • The first question is: which resource do I use? The choice seems to be between Learn OCaml and RWO. I ended up choosing the former
  • Digression: I’m a Nix user, so had to first figure out how to take procedural install instructions (e.g. opam switch) and turn them into declarative environment declarations for a Nix shell
  • Setting up LSP: First I needed to figure out the difference between OCaml-LSP and Merlin (turns out, the former uses the latter);
    • the official docs say that Vim users don’t need to setup up LSP, and use Merlin directly – why?
  • how do I test for equality? No mention of the semantics of = that I could easily find
    • even “worse”, no mention of the highly unusual <> :slight_smile:
  • I ended up finding this info here, but knowing what to search for, the Cornell course also has this info
  • the stdlib pages don’t list individual functions in the sidebar, which makes it hard to skim them for what I’m looking for
  • additionally, they don’t provide any examples, which makes it much harder to figure out how to use the functions
  • ; vs ;; vs in: here I got confused. The examples only deal with the top-level, and introduce the ;; syntax which AFAIU is only used there. I got tripped up by the difference between ; and in.

For example, running ocamlc against this is a syntax error:

type nucleotide = A | C | G | T

let hamming_distance s t =
  (* iterate over s and t, whihc have the same length, using an index i.
      compare the character s[i] and t[i]. if they're different, increase the number
  *)
  let distance = ref 0 in
  let func i c =
    if s.[i] <> t.[i] then
      (* distance := !distance + 1 in *)
      incr distance;

  String.iteri func s;

  distance

(it needs to be String.iteri func s in)

But this works:

type nucleotide = A | C | G | T

let hamming_distance (s: nucleotide list) (t: nucleotide list) =
  (* iterate over s and t, whihc have the same length, using an index i.
      compare the character s[i] and t[i]. if they're different, increase the number
  *)
  let distance = ref 0 in
  List.iter2 (fun x y -> if x <> y then incr distance) s t;

  distance

There is obviously a good reason for this, but it would be great to have a place I can easily find which explains this. I couldn’t find one.

  • compiler errors: compiling the failing snippet above only returns:
$ ocamlc hamming.ml
File "hamming.ml", line 16, characters 0-0:
Error: Syntax error

It would be nice to have the compiler spit out the line where the syntax error happened, and maybe even tell me what I should do instead (in this case, ; should become in)

14 Likes

Be careful. The in is supposed to go before the String.iteri func s to end the definition of func. Because you put the in after, you’re calling String.iteri func s in the definition of func. This would in turn be another compile error because recursive functions need to be explicit with let rec.

This is probably what you want:

type nucleotide = A | C | G | T

let hamming_distance s t =
  (* iterate over s and t, whihc have the same length, using an index i.
      compare the character s[i] and t[i]. if they're different, increase the number
  *)
  let distance = ref 0 in
  let func i c =
    if s.[i] <> t.[i] then
      (* distance := !distance + 1 in *)
      incr distance
  in
  String.iteri func s;
  distance

Also, I think some OCaml programmers would prefer using an accumulator parameter for distance instead of a mutable reference.

1 Like

This post gives a good explanation of OCaml semicolons: https://baturin.org/docs/ocaml-faq/

The main thing to understand is that while OCaml does have syntax that allows you to use it imperatively, it can be a little janky and counterintuitive, especially at the edges where it meets the declarative syntax.

Semicolons are a good example of this. In OCaml, code that would look like this in an imperative language:

function helper_function(a,b) {
    // do something...
    return c;
};

print("Starting!");
var a = /* something... */;
var b = /* something else... */;
function_that_changes_program_state(a);
var c = helper_function(a,b);
print(c);

would instead look something like this:

let helper_function a b = (* do something *) in
let _ = print_endline "Starting!" in
let a = (* something... *) in
let b = (* something else... *) in
let _ = function_that_changes_program_state a in
let c = helper_function a b in
print_function_for_whatever_the_type_of_c_is c

Even when we don’t care about the value something produces (like when we’re printing or changing program state) we still structure stuff like let (* X *) = (* Y *) in (* Z *), we just use _ on the left hand side to show that we’re not binding the value of (* Y *) to anything.

The semicolon is just syntactic sugar for this: instead of writing let _ = (* Y *) in (* Z*), we can write (* Y *); (* Z *). We can use it to rewrite the example as:

let helper_function a b = (* do something *) in
print_endline "Starting!";
let a = (* something... *) in
let b = (* something else... *) in
function_that_changes_program_state a;
let c = helper_function a b in
print_function_for_whatever_the_type_of_c_is c

But that’s all it is; for example, writing a semicolon at the end of the last line of the above code is not correct and may cause a syntax error depending on where in the program that block occurs.

Your first code example desugars to something like:

type nucleotide = A | C | G | T

let hamming_distance s t =
  (* iterate over s and t, whihc have the same length, using an index i.
      compare the character s[i] and t[i]. if they're different, increase the number
  *)
  let distance = ref 0 in
  let func i c =
    if s.[i] <> t.[i] then
      (* distance := !distance + 1 in *)
      let _ = incr distance in
      let _ = String.iteri func s in
      distance

You could rewrite it to avoid the awkwardness of the imperative parts of OCaml like so:

type nucleotide = A | C | G | T

let rec hamming_distance s t =
  (* Remove the first element of s and the first element of t.
     Calculate the Hamming distance between the rest of s and the rest of t,
     then add one to it if the first element of s and the first element of t
     are different. If s and t are both empty, their Hamming distance is zero.
     If only one is empty they started as different lengths, and so throw an
     error.
  *) 
  match s, t with
  | s_head::s_tail, t_head::t_tail -> (* lists are not empty *) 
      (if s_head <> t_head then 1 else 0) + (hamming_distance s_tail t_tail)
  | [], [] -> (* lists are both empty *)
      0
  | _ -> invalid_arg "Lists are not the same length" 
           

Or, you even go for


type nucleotide = A | C | G | T

let rec hamming_distance s t = 
  List.fold_left2
    (fun dist a b -> if a <> b then dist + 1 else dist) 0 s t
1 Like

Regarding this, installing opam from NIxpkgs and then doing things imperatively works fine, and that’s what I usually do, because I move between switches a lot and find it cumbersome to write Nix files all the time. However, I agree that an easy-to-find resource detailing how to depend on OCaml declaratively would be nice. I know this one.

Edit: disclaimer: this one is probably not for beginners though.

1 Like

You skipped mentioning something (I believe), which is that you also went to https://exercism.org/ to try learning ocaml by solving problems. Like an interactive version of Exercises.

I think this because I tried that site and encountered (and got stuck at) an identical nucleotide problem. For me, because I was using Base/Core I didn’t understand how to do variant equality or why it was so hard to look up how to do it (answer, use a ppx that generates equality functions on that data type)

FWIW I think a better version of exercism would be really fun for people to learn ocaml. When I was interview studying to apply for jobs I solved a bunch of problems at https://www.algoexpert.io/ - kind of a much more polished version of leetcode - and enjoyed studying more than I thought I would. The problems should probably be more tailored for ocaml than typical algorithm questions, but it’s fun to learn a language by solving smaller coding problems.

2 Likes

I would also modify the OCaml example with:

let helper_function a b = (* do something *) in
- let _ = print_endline "Starting!" in
+ let () = print_endline "Starting!" in
let a = (* something... *) in
let b = (* something else... *) in
- let _ = function_that_changes_program_state a in
+ let _ : string = function_that_changes_program_state a in
let c = helper_function a b in
print_function_for_whatever_the_type_of_c_is c

to avoid ignoring a value incorrectly or not even calling a function due to partial application