Why don't we have a composition operator in Pervasives?

I’d like to add my 2 cents on what makes a program more understandable than another:

  1. A simpler structure
  2. Hints for interpreting the intended behavior

Regarding point 1: Code length is not a good metric for those. Replacing (fun x -> f (g x)) with e.g. f <.> g does not simplify the structure of the program in any meaningful way. I suppose it can be useful in specific applications where the operator is used a lot in one specific module, but in this case it’s straightforward to define it locally. In my experience, such custom operators are never exactly function composition and we end up defining custom operators anyway.

Regarding point 2: Hints for interpreting intended behavior in OCaml and other text-based programming systems include pseudo-English names, comments, and consistency with the rest of the program. Mathematical symbols with a universal meaning could be treated as names, but that’s not the case for <.>.

1 Like

Personally Yaron, I read from left to right. So, I really dislike the |> operator (because your code must be read from right to left to understand what it does).

Very nice suggestion. This is truly the best way to write this operator.
Is there a hack to be able to do this in my code?
Because, well, you know, if you wait for something to enter the stdlib…

Do you read imperative code from bottom to top/right to left? It uses the same order as |>.

1 Like

I believe that latin1 chars were deprecated in OCaml to facilitate an eventual state where utf8 is used for source code, but I don’t know if that’s currently accepted by the compiler for identifiers. In any case, I don’t think the compiler currently would accept arbitrary unicode math operators as operators in the grammar. OCaml presumes that all infix operators have to start with a character from the set

=  <   >   @   ^   |  &  +  -  *  /  $  %

and so there is no way to use a unicode char as an infix operator…

(Presumably a future version of OCaml could add, say, all the math operators from Unicode to the set of permitted characters in operators, but as things stand this would be slightly messy, especially since precedence, associativity, and fixity are currently hardwired in to the language rather than being something you can declare in your program.)

1 Like

I’m very grateful that @@ was introduced into the stdlib, because I generally find the functional order of operations far more readable, though at this point, I like |> as well.

I think the bottom line here is that people have different writing styles, and just because we’re personally used to one style, it doesn’t mean others are invalid. Many people define the composition operator because it’s genuinely useful (for those people), and it would be helpful for this operator to be uniform so we immediately recognize it rather than having to figure it out and then mentally translate.

3 Likes

Everyone’s free to add their own operators. But that doesn’t mean it’s a good idea to support every possible style in the standard library. A standard library makes choices about shared idioms, and those decisions should be made judiciously, and not simply throw together support for every possible style.

In any case, this proposal has been discussed and rejected fairly recently, so I doubt it will change soon.

2 Likes

I agree that it isn’t always what you want; but it’s often just the thing, letting you think about your code as a pipeline. And from my perspective, when I want the other application order (which is pretty often!) I just use parens in the ordinary way.

(Also, your user id is @UnixJunkie! Don’t you like unix piplines? That’s what the |> style looks closest to!)

y

3 Likes

I love unix pipelines, and they are evaluated left to right…

Time for little quiz I guess. How this is evaluated?

2  |> ( + ) 2 |> ( * ) 3

Another possibly interesting example:

5  |> ( - ) 2 |> ( * ) 3

And what is the higher-order composition operator for you ? That’s what it is intended for. :wink:

Don’t you see the pipeline from space (or type) A to space D ?

(* pipeline from left to right *)
let (>>) f g x = x |> f |> g

(* pipeline from right to left, the composition operator in mathematic *)
let (<<) f g x = x |> g |> f

let double x = 2 * x
let square x = x * x

(double >> succ >> square) 3;;
- : int = 49

(square << succ << double) 3;;
- : int = 49

List.map (double >> succ >> square) [1; 2; 3];;
- : int list = [9; 25; 49]

(* we can easily define our own pipelines *)
let my_pipeline = double >> succ >> square
let show_pipeline = string_of_int >> print_endline

(* and chain them *)
List.iter (my_pipeline >> show_pipeline) [1; 2; 3];;
9
25
49
- : unit = ()

@UnixJunkie: i do not understand what you say, |> is evaluated left to right like unix pipelines.

3 Likes

Absolutely! They serve much of the same purpose, which is why I don’t think we need both as infix operators.

y

If I understand correctly, your position is that since we can write my last examples this way:

let my_pipeline x = x |> double |> succ |> square
let show_pipeline x = x |> string_of_int |> print_endline

List.iter (fun x -> x |> my_pipeline |> show_pipeline) [1; 2; 3];;

we don’t need other infix operators. Am I right?

That’s mostly a matter of taste. Since I’m used to mathematical notations, I prefer infix operators that operate on functions. The interest I find in >> and << is that we “see” the pipeline’s flow.

I don’t understand why people who prefer this idiom should define their own operators and can not rely on the standard lib to have a uniform notation or, otherwise, use an idiom they dislike to stay standard.

1 Like

OCaml doesn’t have the function composition operator mainly because of the value restriction. Here is an example, that worthwhile hundreds of words:

# let (++) f g x = f (g x);;
val ( ++ ) : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b = <fun>
# ident ++ ident;;
- : '_a -> '_a = <fun>

Thus, no matter, how useful the composition operator looks to you, it just will not work in OCaml, because functions are not considered as first-class values by the type system. Of course, it still can be used for ground terms, and you can define this operator, as I did before and use it if you believe that it will make your code more readable. My personal opinion is that the application operator, as well as the reverse application (@@ and |>) are much more intuitive, at least for those of us, who prefer to read from left to right. But your mileage may vary.

And for the historical reference, this is a link to, probably, the first discussion of the composition operator in OCaml, that is available on the Internet.

12 Likes

The OCaml developers have given us a powerful rewriting tool for when we think they are a bit too conservative about a new feature:

# #require "ppx_compose";;
# let ident x = x;;
val ident : 'a -> 'a = <fun>
# ident % ident;;
- : 'a -> 'a = <fun>

At least that solves the issue with the value restriction. About consensus for the operator, the best I could think of was to reuse the operators from an existing library (Batteries Included).

5 Likes

Thanks for the reference. I’ve read it few years ago and I still disagree with Pierre Weis.

It’s frequent that we have to use composition in a monomorphic context, so the value restriction is rarely a problem. But eta expand the pipeline leads to a lose of efficiency:

let make_add () =
  let r = ref 1 in
  (fun () -> r := 1),
  (fun i -> 
    Printf.printf "Add has been evaluated %d times\n" !r;
    incr r; (+) i)

let reset, add = make_add()

map (fun x -> x |> add 1 |> add 2) (1 -- 5);;
Add has been evaluated 1 times
Add has been evaluated 2 times
Add has been evaluated 3 times
Add has been evaluated 4 times
Add has been evaluated 5 times
Add has been evaluated 6 times
Add has been evaluated 7 times
Add has been evaluated 8 times
Add has been evaluated 9 times
Add has been evaluated 10 times
- : int BatEnum.t = 4 5 6 7 8

reset (); map (add 1 %> add 2) (1 -- 5);;
Add has been evaluated 1 times
Add has been evaluated 2 times
- : int BatEnum.t = 4 5 6 7 8

With infix composition your pipeline is evaluated only once, but evaluated each time we apply it when it is eta expanded. That’s inefficient !!!

Concerning the notation, both batteries and containers use %> and % instead of >> and << which is better for precedence level with @@:

double %> succ @@ 3;;
- : int = 7

succ % double @@ 3;;
- : int = 7

Concerning the value restriction, Base.Fn has a const function which has the same problem and writing const 1 instead of fun x -> 1 is not really a gain.

3 Likes

One strategy is to make it evident using the type system when you have “costly” partial applications (i.e., partially-applied functions that are expensive to construct because they do something with arguments as they are passed one-by-one).

For example, Base does this with a module called Staged:

type +'a t
val stage : 'a -> 'a t
val unstage : 'a t -> 'a

So for example you might have a regexp-matching function that first compiles the regular expression:

(** [does_match pattern input] *)
val does_match : string -> string -> bool

You might be tempted to use it like so, which would be expensive because it would compile the regular expression over and over again:

let nums = List.filter (fun str -> does_match "\\d+" str) many_strings in ...

But if does_match is changed to have the following type:

val does_match : string -> (string -> bool) Staged.t

Then it becomes obvious that partially applying the function to only its first argument is “expensive”, so then it is clear that the preceding code should be written like so, where we save the work of compilation:

let is_number = unstage (does_match "\\d+") in
let nums = List.filter is_number many_strings in
...

The different type makes it much more obvious when the result of partial application should not be evaluated more than once for performance (or other) reasons.

Are you serious? Have you read the discussion?

I have, for example, two functions f : int -> int -> int and g : string -> int -> int, and I want to create a pipeline with them. @Yaron_Minsky told me that I just have to write:

let my_pipeline x = x |> f 1 |> g "I dislike composition"`

Then I answer this is less efficient than composition because the pipeline is evaluated each time I use it, which is not the case if I simply write:

let my_pipeline = f 1 %> g "I like composition"

And now, you tell me that I should use poor man staged programming with the identity monad! :roll_eyes:

1 Like

Right, and I was addressing the concern of performance by pointing out that in cases that you do have non-trivial partially-applied functions you can simply give a name to the partially-applied version. (The identity monad helps you mark where you have performance implications for partial application.) Most of the time, I wager, the function you are working with does not behave that way.

If you want to do the performance comparison for non-staged partial application, the result is contrary to what you are suggesting (|> is faster than %>):

open! Core
open Core_bench

let (%>) f g x = g (f x)

let f x y = x + String.length y
let g x y = x * y

let pipeline_composed = f 1 %> g 2

let pipeline_eta_expanded x = x |> f 1 |> g 2

let () =
  let composed     = Test.create ~name:"composed"     (fun () -> pipeline_composed "hello")     in
  let eta_expanded = Test.create ~name:"eta expanded" (fun () -> pipeline_eta_expanded "hello") in
  Bench.make_command [ composed; eta_expanded ]
  |> Command.run
;;

With the output:

Estimated testing time 20s (2 benchmarks x 10s). Change using -quota SECS.
                                        
  Name           Time/Run   Percentage  
 -------------- ---------- ------------ 
  composed         6.79ns      100.00%  
  eta expanded     2.55ns       37.55%  

(at least on my machine, a 1.4GHz Macbook Air running OCaml 4.05.0, no flambda)

I’m not going to hazard any speculation about why the performance of the two pipelines are so different in the first place, since I’m certainly not qualified to, but that’s the result I got, FWIW. YMMV.

Sorry my initial post wasn’t clear about what point I was addressing. I should have included an example or something.

3 Likes