Report on first steps in OCaml

This is a report of the first steps on my journey to learning OCaml.

To give you some context, I am a relatively experienced programmer with very little free time, so I was looking for a crash-course, i.e. the quickest possible introduction to the language, so that I can get to work and learn as I implement my side-project.

I know there’s work being done on the documentation site, and that’s fantastic. This report dates back to a month ago, so it might be that some resources have been updated in the meantime. cc @professor.rose who asked me about my experience as a newcomer :slight_smile:

Anyway, here goes:

  • The first question is: which resource do I use? The choice seems to be between Learn OCaml and RWO. I ended up choosing the former
  • Digression: I’m a Nix user, so had to first figure out how to take procedural install instructions (e.g. opam switch) and turn them into declarative environment declarations for a Nix shell
  • Setting up LSP: First I needed to figure out the difference between OCaml-LSP and Merlin (turns out, the former uses the latter);
    • the official docs say that Vim users don’t need to setup up LSP, and use Merlin directly – why?
  • how do I test for equality? No mention of the semantics of = that I could easily find
    • even “worse”, no mention of the highly unusual <> :slight_smile:
  • I ended up finding this info here, but knowing what to search for, the Cornell course also has this info
  • the stdlib pages don’t list individual functions in the sidebar, which makes it hard to skim them for what I’m looking for
  • additionally, they don’t provide any examples, which makes it much harder to figure out how to use the functions
  • ; vs ;; vs in: here I got confused. The examples only deal with the top-level, and introduce the ;; syntax which AFAIU is only used there. I got tripped up by the difference between ; and in.

For example, running ocamlc against this is a syntax error:

type nucleotide = A | C | G | T

let hamming_distance s t =
  (* iterate over s and t, whihc have the same length, using an index i.
      compare the character s[i] and t[i]. if they're different, increase the number
  *)
  let distance = ref 0 in
  let func i c =
    if s.[i] <> t.[i] then
      (* distance := !distance + 1 in *)
      incr distance;

  String.iteri func s;

  distance

(it needs to be String.iteri func s in)

But this works:

type nucleotide = A | C | G | T

let hamming_distance (s: nucleotide list) (t: nucleotide list) =
  (* iterate over s and t, whihc have the same length, using an index i.
      compare the character s[i] and t[i]. if they're different, increase the number
  *)
  let distance = ref 0 in
  List.iter2 (fun x y -> if x <> y then incr distance) s t;

  distance

There is obviously a good reason for this, but it would be great to have a place I can easily find which explains this. I couldn’t find one.

  • compiler errors: compiling the failing snippet above only returns:
$ ocamlc hamming.ml
File "hamming.ml", line 16, characters 0-0:
Error: Syntax error

It would be nice to have the compiler spit out the line where the syntax error happened, and maybe even tell me what I should do instead (in this case, ; should become in)

16 Likes

Be careful. The in is supposed to go before the String.iteri func s to end the definition of func. Because you put the in after, you’re calling String.iteri func s in the definition of func. This would in turn be another compile error because recursive functions need to be explicit with let rec.

This is probably what you want:

type nucleotide = A | C | G | T

let hamming_distance s t =
  (* iterate over s and t, whihc have the same length, using an index i.
      compare the character s[i] and t[i]. if they're different, increase the number
  *)
  let distance = ref 0 in
  let func i c =
    if s.[i] <> t.[i] then
      (* distance := !distance + 1 in *)
      incr distance
  in
  String.iteri func s;
  distance

Also, I think some OCaml programmers would prefer using an accumulator parameter for distance instead of a mutable reference.

1 Like

This post gives a good explanation of OCaml semicolons: https://baturin.org/docs/ocaml-faq/

The main thing to understand is that while OCaml does have syntax that allows you to use it imperatively, it can be a little janky and counterintuitive, especially at the edges where it meets the declarative syntax.

Semicolons are a good example of this. In OCaml, code that would look like this in an imperative language:

function helper_function(a,b) {
    // do something...
    return c;
};

print("Starting!");
var a = /* something... */;
var b = /* something else... */;
function_that_changes_program_state(a);
var c = helper_function(a,b);
print(c);

would instead look something like this:

let helper_function a b = (* do something *) in
let _ = print_endline "Starting!" in
let a = (* something... *) in
let b = (* something else... *) in
let _ = function_that_changes_program_state a in
let c = helper_function a b in
print_function_for_whatever_the_type_of_c_is c

Even when we don’t care about the value something produces (like when we’re printing or changing program state) we still structure stuff like let (* X *) = (* Y *) in (* Z *), we just use _ on the left hand side to show that we’re not binding the value of (* Y *) to anything.

The semicolon is just syntactic sugar for this: instead of writing let _ = (* Y *) in (* Z*), we can write (* Y *); (* Z *). We can use it to rewrite the example as:

let helper_function a b = (* do something *) in
print_endline "Starting!";
let a = (* something... *) in
let b = (* something else... *) in
function_that_changes_program_state a;
let c = helper_function a b in
print_function_for_whatever_the_type_of_c_is c

But that’s all it is; for example, writing a semicolon at the end of the last line of the above code is not correct and may cause a syntax error depending on where in the program that block occurs.

Your first code example desugars to something like:

type nucleotide = A | C | G | T

let hamming_distance s t =
  (* iterate over s and t, whihc have the same length, using an index i.
      compare the character s[i] and t[i]. if they're different, increase the number
  *)
  let distance = ref 0 in
  let func i c =
    if s.[i] <> t.[i] then
      (* distance := !distance + 1 in *)
      let _ = incr distance in
      let _ = String.iteri func s in
      distance

You could rewrite it to avoid the awkwardness of the imperative parts of OCaml like so:

type nucleotide = A | C | G | T

let rec hamming_distance s t =
  (* Remove the first element of s and the first element of t.
     Calculate the Hamming distance between the rest of s and the rest of t,
     then add one to it if the first element of s and the first element of t
     are different. If s and t are both empty, their Hamming distance is zero.
     If only one is empty they started as different lengths, and so throw an
     error.
  *) 
  match s, t with
  | s_head::s_tail, t_head::t_tail -> (* lists are not empty *) 
      (if s_head <> t_head then 1 else 0) + (hamming_distance s_tail t_tail)
  | [], [] -> (* lists are both empty *)
      0
  | _ -> invalid_arg "Lists are not the same length" 
           

Or, you even go for


type nucleotide = A | C | G | T

let rec hamming_distance s t = 
  List.fold_left2
    (fun dist a b -> if a <> b then dist + 1 else dist) 0 s t
1 Like

Regarding this, installing opam from NIxpkgs and then doing things imperatively works fine, and that’s what I usually do, because I move between switches a lot and find it cumbersome to write Nix files all the time. However, I agree that an easy-to-find resource detailing how to depend on OCaml declaratively would be nice. I know this one.

Edit: disclaimer: this one is probably not for beginners though.

2 Likes

You skipped mentioning something (I believe), which is that you also went to https://exercism.org/ to try learning ocaml by solving problems. Like an interactive version of Exercises.

I think this because I tried that site and encountered (and got stuck at) an identical nucleotide problem. For me, because I was using Base/Core I didn’t understand how to do variant equality or why it was so hard to look up how to do it (answer, use a ppx that generates equality functions on that data type)

FWIW I think a better version of exercism would be really fun for people to learn ocaml. When I was interview studying to apply for jobs I solved a bunch of problems at https://www.algoexpert.io/ - kind of a much more polished version of leetcode - and enjoyed studying more than I thought I would. The problems should probably be more tailored for ocaml than typical algorithm questions, but it’s fun to learn a language by solving smaller coding problems.

2 Likes

I would also modify the OCaml example with:

let helper_function a b = (* do something *) in
- let _ = print_endline "Starting!" in
+ let () = print_endline "Starting!" in
let a = (* something... *) in
let b = (* something else... *) in
- let _ = function_that_changes_program_state a in
+ let _ : string = function_that_changes_program_state a in
let c = helper_function a b in
print_function_for_whatever_the_type_of_c_is c

to avoid ignoring a value incorrectly or not even calling a function due to partial application

The stdlib is practically invisible to search engines, it seems. I tried searching for ocaml failwith in Google, DDG and Kagi, and none of them returned the function defined here.

What’s the best place to report this? I’m not sure where the repo for the API website is.

I believe it would be GitHub - ocaml/ocaml.org: The official OCaml website.