Ad hoc polymorphism and usability

What’s the issue with classes and mixins? Classes can be used to implement mixins in OCaml just fine as far as I can tell.

That’s an interesting way to rationalize it, but I doubt it was the original intention to make objects slow. In any case, it was certainly a success to discourage their use. We also shouldn’t forget that awkward syntax and the comparatively poor merlin support played their part as well.

Getting a bit off-topic, but as far as I know objects are not particularly slow. Method calls are a bit slower than function calls (roughly a hash table lookup + an indirect function call if I remember correctly), but any language with dynamic dispatch will have to pay this cost in the general case.

Rather, for me, the main downside of using objects is that their typing theory (which is also the basis of their usefulness) is considerably more complex than for the rest of the language. That alone justifies, in my opinion, a “use only when needed” policy.

Just my 2c.

Cheers,
Nicolas

2 Likes

The problem is that in OCaml it’s not only on the general case.

Interestingly, in Rust, one gets to explicitly decide whether or not to dynamic dispatch, and if you’re not doing it/don’t need it, you don’t pay the price. But we are truly getting far from the topic.

I would add “+1” to opinions expressed in this topic regarding code readability. I very much like how String.(a == b) immediately tells the reader that strings are being compared here, and Queue.append signals that last positional argument is a Queue.t. If all containers and monads will have corresponding implicit modules defined and actively used, reading the code would be a nightmare even with IDE support, as you’ll have to hover like mad to reconstruct the context on what’s happening in the function.

5 Likes

Sorry, but I don’t think this is a good example. The local open String.(...) does not help you figure out the provenance of the bindings in (...). If anything it makes the task a bit harder, due to the enlarged scope!

Perhaps you meant String.equal a b?

Cheers,
Nicolas

  • From the compiler’s perspective, entropy increases. More content in scope means bigger haystack to find the same needle
  • From a practical standpoint, local opens tend to be surgical in practice, and likely reorder what modules the reader considers when identifying symbols, hoisting the local open to the top of the mental search list

So while it’s hard to argue the technical merits of a larger scope, the local open does tend to guide your search through the haystack.

Aside, OCaml module symbol exposure is top-3 least favorite things of the language, for me. Would gladly expose everything I need explicitly in higher scopes, as I do with other languages (sans globals). I’m suspect that I’m in a small party on this matter, however :slight_smile:

1 Like

To be fair, if one writes String.(a == b) and (==) does not come from String, they are actively doing code obfuscation; AFAIK no language (feature) can protect against that.

5 Likes

I believe the main reason people feel objects are slow is that they’re impossible to optimise correctly with the current implementation.
In bytecode, things are mostly fine, as very few optimisations are done anyway so method calls are only slightly slower than regular function calls.
But in native, while { f = (fun x -> x) }.f 0 should be simplified to 0 (at least with Flambda), the equivalent (object method f = fun x -> x end)#f 0 will not be simplified at all. Worse, while intuitively a simple object expression should be compiled to a simple allocation, the code for compiling objects and classes is shared so a lot of unnecessary code that is only useful for classes is generated. Oh, and method calls can’t be inlined either.

So I think it is correct to say that objects are slow in OCaml, even if it’s due more to technical details and the lack of any effort to improve the situation in the last twenty years than any theoretical issue or design compromise.

2 Likes

I agree with that, modules are very nice, and scope is a good functionality for “type based” dispatch. It makes everything more explicit, and does not add complexity for the language.
However, the advanced module stuff is way to cumbersome. For instance, to use a string map, you have to do :

module StringMap = Map.Make(String)
let a = StringMap.empty

Whereas I believe you should be able to do :

let a = Map(String).empty

The reason this is not allowed is that modules are evaluated at runtime, and this would not be optimised away at runtime. Worse than that : it is impossible to optimise it at runtime in the general case, because there might be some side effects in the functor definition.

I just had the idea that providing lazy functors that have a different syntax would solve this issue.
“constexpr” functors could also do the trick, but this seems harder to do.
A way to define whole files as functors would also be required (this seems like a low hanging fruit for the language as whole)

1 Like

Well that might be an accident of refactoring. Or, the expression could be larger and as a reader you wouldn’t know which symbols specifically are supposed to be captured. As much as I like local opens, I find they have 2 shortcomings:

  1. String.(a == b) is not the same as String.(==) a b, i.e. it doesn’t guarantee you that == is selected from the opened module [case in point: with pure Stdlib, the former is just a == b i.e. polymorphic physical equality, which perhaps was unexpected].
  2. Selected symbols are… well, shadowed: for instance, consider these expressions with big integers: fib.(i) <- Z.( fib.(i-1) + fib.(i-2) ) or Z.( ~$5 * pow ~$3 (n+1) ). You’re out of luck, they also involve native integers… So you’d rather use prefix notation Z.(+) or pre-compute the indices outside Z’s scope, which defeats the purpose of infix operators. Or nest local opens, but then your expression grows larger and less readable (plus, little stdlib prank, Int.(+) doesn’t exist).

The second point would go away with type-based disambiguation of overloading. The first point would also be alleviated if, in addition, we stopped having comparison builtins with parametric polymorphism, and rather used overloading for that.

PS: I don’t think discussing local opens or method dispatch is off-topic; they are related to addressing ad-hoc polymorphism / overloading in a broad sense.

1 Like

I think a bigger issue is the type equality of abstract types. How would you determine whether Map(String).t = Map(String).t? Sometimes that is what you want, other times it’s not. Currently StringMap.t = StringMap.t and Map.Make(String).t <> Map.Make(String).t.

1 Like

I think your exemple is wrong :

module StringMap1 = Map.Make(String)
module StringMap2 = Map.Make(String)

let map1 = StringMap1.empty
let map2 = StringMap2.add "hey" 1 map1

types correctly on my computer.

However, regarding my “lazy functors” proposal, we would have Map[String].t = Map[String].t (I use [ ] as opposed to ( ) to specify this is a lazy functor being called).
That’s because the Map functor would only be evaluated once, and therefore both would be the same modules, and also because not doing it this way would make the feature completely unusable.
Its right that sometime you do want the types to be different, but in that case it is possible to hide it with the tools already provided in the language : storing it in a named module with a custom signature.

Its however true that in order for the functor to be truly lazy, we need to have a way to hash the input module in order to be able to check if they were already evaluated. Physical equality should suffice, but it is possible that the GC moves the pointers which would make this a pain. Without really knowing more than that, it does not seem to be a huge difficulty.

Huh, indeed. Map.Make(String).t <> Map.Make(struct include String end).t at least. Looks like type equality is derived based on the identity of the functor arguments then. That’s quite surprising to me, and seems to be undocumented from what I can tell. But that also means there’s already a notion of module identity that could be used to implement lazy functors, no?

Languages/runtime that do whole program optimization have devirtualization phase which could improve performance of (virtual) method call. Do you know any languages/runtimes with file-by-file compilation scheme, which are able to do similar optimizations? Maybe we can adopt some ideas from there to OCaml…

Iiuc, this is because ocaml functors are “applicative” by default. You seem to be expecting behavior of generative functors here. See, e.g., functional programming - What is the difference between Applicative and Generative Modules, and Type Classes? - Software Engineering Stack Exchange

2 Likes

↑ I second this 10x. The clunkiest part of OCaml is not having a polymorphic print statement. If OCaml had that, 90% of the rational for derivable ad-hoc polymorphism disappears.

The hacked version of Elm I’ve used has polymorphic toString (for debugging) and polymorphic compare (for putting any item in a Set or Dict). With those two special cases, I did not miss generic ad-hoc polymorphism.

@dbuenzli notes a polymorphic debug printer might require run-time type info: that type info by itself would be of great use to me. I’m building live programming environments. But because the type of an OCaml value is not recoverable at runtime, I am forced to run OCaml code through an interpreter so that I can treat all values as members of an ADT. Runtime type information might make it possible to avoid the interpretative overhead!

4 Likes

I will note that Rust has no run time type representation information and none the less is capable of providing a macro that does “polymorphic print” without any sort of loopholes in the type system. This is perhaps the most common use of ad hoc polymorphism on a day to day basis in the language (ignoring things like being able to use + for any numeric type) but it is hardly the only such use; as I said, it’s just plain really convenient to have ad hoc polymorphism.

1 Like

Not to put words in his mouth, but I suspect @dbuenzli is not talking about run-time type information but rather a standardized way of representing the structure of a type so that serializations and other functions commonly written via deriving can be written more generically. See the remarks and ensuing discussion here.

2 Likes

It’s relevant to note that the absence of runtime type information is precisely one of the defining attributes of languages in the ML family, and has been part of the definition more-or-less since the jump. “Evaluation is invariant under type-erasure” (or something like that).

ETA: Lots of us use ML for precisely this reason, and would regard any attempt to reintroduce runtime type information as anathema.

7 Likes