Ocaml Beginner: How do I check if there are duplicate elements in a list?


#1

Hi! So I’m working on this assignment, essentially I’m having some trouble thinking about a way to go about this. The problem is to identify if a list has duplicate elements, and return true if it does. For example, [1;2;3;4] would be false, and [1;2;2;3] would return true. In java I would probably just take the first element then check if the rest are equal to the first element, and move on until the end of the list. How would I get started in OCaml?

My current code isnt much, but I know that for pattern matching, obviously I have the []->false, but to traverse the list I’m confused how to go about it


#2

I would approach this problem by creating a secondary list(initially []) and check each head element against it and then build up the secondary list with the head elements… Basically something like this…

let dupExist lst =
  let rec dupExistAux lst b =
    match lst with
    | [] -> false
    | hd::tl -> 
      if (List.exists (fun x -> x = hd) b)
      then 
        true
      else
        dupExistAux tl (hd::b) in
  dupExistAux lst []

What was I thinking(duh!)? Here’s a better way!

let rec exist elem lst =
  match lst with
  | [] -> false
  | hd::tl -> elem = hd || exist elem tl

let rec dupExist lst =
  match lst with
  | [] -> false
  | hd::tl -> (exist hd tl) || dupExist tl

#3

I would probably use a hash table here to internally keep track of items that have already been seen. For each element in the input list, add a key-value pair of element, () to the hash table and simultaneously update a list length counter. At the end, check if the list length counter is different from the hash table length (which is O(1)). If they’re different, you have duplicate elements in the list.

Relevant functions here would be List.iter, Hashtbl.create, and Hashtbl.replace.


#4

It might be nice to have a way to break on the first duplicate encountered. You could fold over the list l, building a list u of unique elements as you go. For each new element in l you will have to compare it to all elements in u; if it is equal to one of them, return true, otherwise add it to the list u and continue.


#5

That’s true. List.exists would probably be good for that, used in an imperative way with a hash table.


#6

A variant of G4143 's solution with the List.exists function :

let rec dup_exist = function
  | [] -> false
  | hd::tl -> List.exists ((=) hd) tl || dup_exist tl

#7

We can also short-circuit a bit more on every iteration. In lists with upto two elements we can tell quite easily that there are no duplicates:

let dupExist = function
  | [] | [_] -> false
  | [a1; a2] -> a1 = a2
  | list -> ...

For larger lists we can do some logic to walk the list and check for duplicates.


#8

However, hash tables mean mutation. Unless performance is critical, maps allow a pure functional solution.


#9

Mutation doesn’t mean it’s not pure-functional. If the mutation is happening inside the scope of a single function call and not leaking out across function calls, it’s completely pure. The critical point is, what side effects are observable outside the scope of a function call.

Anyway, to me, using a hash table wasn’t really about performance but more about convenience. With maps, you need to instantiate a concrete map module using the type of map keys. For a potentially polymorphic input list, you don’t know the type of map keys. This can, probably, be overcome using a locally abstract type, but with a hash table you don’t need such tricks.


#10

Your mileage may vary, but I always find that unless something really requires mutation, even inside a sealed box where it can’t leak out, I’m always happier without it. The more mutation in my code, the more mistakes I seem to make.