[SOLVED]Code review: May I remove `|> return` or `return ()` in order to run?

Recently I wrote an OCaml tool to analysis a large log file. To make it easy to read, I simplified the log format as

1
1
2
2
3
3
3
4
4
4
4
4

Each line of the log is a number between 0 and 10, my job is to analysis the count of each number, for example, the result above would be

Key: 1, Value: 2
Key: 2, Value: 2
Key: 3, Value: 3
Key: 4, Value: 5

Once again, to make it easy to read, I simplified the OCaml code as

open Core
open Async

let touch_tbl (imap: int Int.Map.t) (spot: int) : int Int.Map.t Deferred.t =
  let count =
    match Int.Map.find imap spot with
    | Some x -> x
    | None -> 0
  in
  Int.Map.add imap ~key:spot ~data:(count + 1) |> return

let get_dict (filename: string) : int Int.Map.t Deferred.t =
  let%bind lines = Reader.file_lines filename in
  let tbl = Int.Map.empty in
  Deferred.List.fold
    (List.map lines ~f:int_of_string)
    ~init:tbl
    ~f:(fun acc num -> touch_tbl acc num)

let handle_file (filename: string) : unit Deferred.t =
  let%bind dict = get_dict filename in
  Int.Map.iter_keys
    dict
    ~f:(fun key ->
        print_string "Key: ";
        print_int key;
        print_string ", Value: ";
        print_int (Int.Map.find_exn dict key);
        print_newline ();
      );
  return ()

let () =
  Command.async
    ~summary:"Analysis performance log"
    Command.Spec.(
      empty
      +> anon ("file" %: string)
    )
    (fun filename () -> handle_file filename)
  |> Command.run

I compiled and ran the code, it works as expected:

$ corebuild -pkg async phonlog.native
$ ./phonlog.native sim.log
Key: 1, Value: 2
Key: 2, Value: 2
Key: 3, Value: 3
Key: 4, Value: 5

I was curious about:

  1. |> return at the end of the touch_tbl function, can I remove it in some other way?
  2. Deferred.List.fold in the get_dict function, I used to use List.fold, but found I couldn’t compile until changed it to the Deferred.
  3. return () at the end of the handle_file function, can I remove it in some other way?

I used to write Java program, and I have to admit that I spent a lot of time to debug my code to make all the types ok.

1 Like
  1. It seems like you’re getting confused about the reason for using Async. Async is a way to run code with concurrency ie. to do multiple things at once. You’re writing a very simple program here, and adding Async complicates your code significantly, since you’re adding a monad. Even if you needed to wrap your code in Async, you don’t need to make every function create a Deferred.t.
  2. You’re explicitly specifying the types of each function. This is great for learning, and is probably important at your stage, but is unnecessary in the long term.
  3. Your printing is cumbersome, because when you want to print multiple things, the easiest way is to use the Printf module.

Here’s how you would write this code succinctly and without Async:

    open Core

    let get_dict filename =
        In_channel.with_file filename ~f:(fun file ->
            In_channel.fold_lines file
                ~init:Int.Map.empty
                ~f:(fun map line ->
                    Int.Map.change map ~key:(int_of_string line)
                        (function None -> Some 0
                                | Some x -> Some (x + 1))
                )
        )

     let () =
         let dict = get_dict "file.txt" in
         Int.Map.iter dict
             ~f:(fun key val -> Printf.printf "Key: %d, Value: %d\n" key val)
2 Likes

Or is the compiler debugging your code?

Among the reasons for the steep learning curve of OCaml, the type system is one worth learning. Async also brings in the topic of IO monads (aka futures aka promises aka deferred values), which apart from providing concurrency, refines the types, which is generally considered a good thing.

To your points:

  1. Yes, you can omit it and drop Deferred.t from the type annotation of touch_tbl. The reason is that the code does not do any input or output, so there is no reason to defer the value. The return would then need to be moved down where you call touch_tbl. However,

  2. You can then switch Deferred.List.fold back to List.fold and return the whole result instead.

So, what we have done so far is to factor the return function out of code which does not use IO which amounts to extracting pure values out of the IO monad. The return didn’t go away, but we can make it if we use a different IO combinator. Consider

(>>=) : α t -> (α -> β t) -> β t
(>>|) : α t -> (α -> β) -> β t

let%bind x = m1 in m2 is syntactic sugar for m1 >>= (fun x -> m2), which is the future that schedules m1 with a callback, which receives the result as x and runs the successor future m2 to finish the operation. The second operator differs from the first in that it merely transforms the value computed by the LHS, rather than starting a successor future. This is exactly what’s needed in get_dict, since it’s just post-processing the read lines. That is, you could write

let get_dict (filename: string) : int Int.Map.t Deferred.t =
  Reader.file_lines >>| fun lines ->
  let tbl = Int.Map.empty in
  List.fold
    (List.map lines ~f:int_of_string)
    ~init:tbl
    ~f:(fun acc num -> touch_tbl acc num)

or even

let get_dict (filename: string) =
  Reader.file_lines filename
    >>| List.map ~f:int_of_string
    >>| List.fold ~init:Int.Map.empty ~f:touch_tbl

And then the solution to the last point should follow.

1 Like