"OCaml -- first impressions"



I disagree. When you first come across a language, your first question is never “Does it support multi-threading?” or “Can I overload functions?”.

On the other hand, when you first need to look up the docs, and you land on a page like this, I understand it is very disappointing, even more so if you’re used to documentation that looks like Ruby’s.

The iOS REPL is kinda extreme, though. :joy:


Yeah, that’s never a question, because almost all users expects for math operators to work on every numeric type. Which isn’t the case with OCaml.

Again, no one (except maybe python programmers) expects that MT is broken. And when user discovers this fact it’s a huge turn-off.

Also I don’t see why do you think that List module documentation is disappointing. It’s a reference, not a tutorial. And unlike Ruby, OCaml have pretty good types which also serves as documentation.


It really depends on your target audience. Python and ruby programmers make up a huge chunk of the programming world nowadays. Add javascript devs, and you’ve got the great majority of programmers not expecting multithreading. Of course, anyone from haskell, C++, Java, C# etc who’s looking to build some systems software would be greatly disappointed at the current lack of MT, but green threading is a thing even in those languages, and there are good reasons for it.


I’m afraid we won’t be able to reach a agreement here, @Alex, our opinions on the topic diverge way too much, so let’s not start a dialogue of the deaf. I hear your points, and I hope you hear mine. (Deaf, hear… Got it?)

Anyway, being beginner-friendly and powerful aren’t incompatible, but doing both requires a lot of work. And recent feedbacks may suggest that we aren’t putting enough effort in the “beginner-friendly” side.


Hi there. I saw a link to this thread in the referrers list on medium. I’m the original author so I’d like to clear up some questions and misunderstandings:

  • I do know about utop, but the point is more like “there exists something that is good, so why ship something bad by default?” Python does something similar with its default repr which might or might not work with arrows and such while ptpython pretty much always works. If you don’t want to lock the nice one into the standard library, just ask to install it on first launch.
  • Multithreading: well, this is a complicated subject :stuck_out_tongue: MT is really quite badly broken in my opinion in C++, Java, etc. Clojure is the only language I’ve used that doesn’t have badly broken MT. So OCaml not having a MT solution seems to me to be an advantage, if it’s good like Clojure (or erlang maybe?) when it arrives. The C/C++ way of just blaming the user when things go badly is not “having MT” in my opinion :stuck_out_tongue:
  • Significant whitespace: I’m talking about the impression one might get. I understand that OCaml doesn’t have significant whitespace (except space separating tokens!), but code examples are laid out in a way that looks like there is, if that makes sense?
  • The manual: glad to see people looking at it. This forum is pretty impressive in design and UX btw!
  • Unicode: I would certainly bash Go for not doing unicode, if it wasn’t so low on my list of serious problems with Go :stuck_out_tongue: Punting on the issue by saying “it’s UTF8” is not a serious solution I’m afraid. Some issues: a) how do you iterate over code points? b) what about normalizations? c) how do I know if something is a series of bytes or a piece of text if there is no type difference?
  • iPhone app: You guys should really check out Pythonista, it’s absolutely fantastic. Being able to learn a new language on the commute is very nice and if only Python can do it (which is my experience having tried many apps, including Swift that only does iPad!) that’s a competitive advantage when trying to get new users. And it really shouldn’t be that hard to create something passable.
  • Overloading: sure, that’s weird. Seems just as bad as Elm or Haskell from what I can tell. But these impressions are way before getting to any serious code.

Phew, I think that was all. Hopefully this clears things up! Thanks for taking this “article” seriously!


It’s not punting on the issue. We are very well aware of the limitations of the current approach. There are reasonably well functioning libraries for doing what you want, they just happen not to be part of the standard library, and for some of them, for good reasons.

With the time I just find it amusing that most of the people getting an impression on OCaml almost always mention this “unforgivable” sin.

But the reality is that in the set of languages out there that do have a type for Unicode strings in their standard library, very few of them have a non broken one (the only ones I know for sure have not a broken one are Swift and rust).

I personally don’t see much difference and even prefer almost no (and hence sound) support rather than broken support — e.g. JavaScript, Python, Java, etc. in which you may have a type for Unicode strings but can’t do a) for Unicode scalar values, i.e. the actual textual content, nor answer c).

How to access the module Uutf.String.UTF_8

I believe python 3.3 fixed iterating over unicode code points. That’s 2012.


Indeed you can now iterate over code points, but your indices should iterate over scalar values. Since apparently in 2017 they didn’t fix c) that’s the mess you get:

> python3
Python 3.6.1 (default, Apr  4 2017, 09:40:21) 
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> '\uD800' # unpaired surrogate
>>> '\uD800'[0]
>>> '\uD800'.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
>>> '\uD83D\uDC2B' # paired surrogates representing U+1F42B
>>> '\uD83D\uDC2B'[0]
>>> '\uD83D\uDC2B'[1]
>>> '\uD83D\uDC2B'.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
>>> '\U0001F42B'
>>> '\U0001F42B'[0]

So your python3 string doesn’t represent a sequence of Unicode scalar values, which as a programmer, is the minimal model you’d like to be able to work with (Swift made a bolder, more programmer friendly move for certain scripts as you index by grapheme clusters).


So the way to handle ‘\uD83D\uDC2B’ is to normalize it? I haven’t come across this case so not really sure what you’re talking about quite frankly :stuck_out_tongue:


No, this has nothing to do with Unicode normalization. This has to do with the fact that Unicode code points (integers 0x0000 to 0x10FFFF) in general do not always represent text, Unicode scalar values do (integers in the ranges 0x0000…0xD7FF and 0xE000…0x10FFFF).

Ways of handling this would be to either completely disallow surrogate specification in escapes or to translate that literal string on the fly to ‘\U0001F42B’ and error on unpaired surrogate escapes. This would be some measures towards ensuring you do actually have c) in python3 (I don’t know if there are other means by which such bogus strings can be created) which would then give you a) on scalar values.

You may then be interested in reading my minimal Unicode introduction.


I am happy to help !


The so called minimal Unicode introduction seems in fact largely above the minima !
It shows a great attention to the details which make the difference between a sloppy and true support for Unicode.

This level of quality might better shine in the REPL and the librairies.

Beyond the first impressions a new comer might have, my concern is rather how can I pursue this quest for quality along the whole chain (encoding, printing, editing, …) without having to know all the Unicode internal details.

I find for instance too bad that the lambda-term example for a REPL (https://github.com/diml/lambda-term/blob/master/examples/repl.ml) can manage simple Unicode but is lost when we try to edit Japanese as “私は瀧です”.


I wrote a meant-to-be-friendly answer to Anders and invited him here. :slight_smile:

PS: Ah, I should have read more, as the author made it here. Under a different name – yes I’m desperately looking for apologies.


Same nick and full name as on GitHub and medium. Unclear why this forum rendered my name as my nick and not my real name though :stuck_out_tongue:


I also realised after that you are here known as boxed (Anders Hovmöller) and on the other site as Anders Hovmöller (boxed). This is enough to fool me. :wink:


Thanks, @boxed, by the way, for stopping by and giving more information. As you have seen, your post hit on some things that were already being worked on, and set some others in motion. The immediate effect may not look like big changes of course, but I think we’re improving piece by piece.


Thanks! I would try to think of a good process, and then probably start a new topic soonish on how to update the manual css.


Just one additional comment about the earlier discussion of whether new programmers come to a language looking for multithreading: Maybe not, but programmers newly interested in functional programming will sometimes come looking for multithreading (on multiple cores). Since FP still seems to be a weird stretch for many programmers, they need strong motivation to try any functional language at all, obviously. From what I’ve seen, with Clojure, one of the main selling points of FP for those new to it is the way that FP can simplify taking advantage of multiple cores (e.g. here and here, but I’ve seen this point made repeatedly). I think that for someone who’s sold on FP for the sake of its ability to simplify exploiting multiple cores, but who wants static typing or native code compilation, they might look at OCaml and then be surprised that a well-established FP language doesn’t support easy multicore multithreading. (In the end this issue will just go away with the new multicore revision, I believe.)


Should we change the name toplevel to REPL? While interactive toplevel is an accepted name, REPL seems much more common and should remove a (minor) pain point for people new to the language. If there is an agreement, I’ll update the website ocaml.org.


No. The rest of the eco-system documentation and tooling would become inconsistent.

Please let’s focus on the real pain points.