What are the biggest reasons newcomers give up on OCaml?

Yeah. People who hold up Python as their example of Unicode support done right make Raku programmers shake their head, purse their lips and then face palm.

I don’t have a computer nearby to verify but I think you can make ocamlformat respect your string literals by using the dedicated/alternative syntax:

let s = {| Your string literal here |} ;;

And so it should be. Libraries for UTF-8 which provide indexing and the like are a trap for the inexperienced, who don’t realize that indexing is not of random access (O(1)) complexity. Better to get used from the outset to regarding UTF-8 text as an opaque blob to which decoders can be applied.

4 Likes

To help others get shit done, write and share more whether that’s to open source more, share all your ocaml examples regardless of status (toy, production).

Can confirm, there’s extreme demand for this, even on weird places like Twitter (now X).

Show people your happy path using OCaml and specific libraries from OCaml. It may be trivial to you now, but seeing the happy path by looking at some docs is still very hard. Heck, this may even inspire

  • better documentation,
  • improvements to the tools,
  • ideas from others how to optimize your own way of doing things

Cool things happen when the private becomes a little more public. And don’t hesitate either to show when something isn’t perfect - this is inspiring for people who are looking for an opportunity to fix and improve something. :slight_smile:

7 Likes

How about documenting the complexity? It’s not like big O notation takes much time to write.
Even Haskell’s Text managed to do this, and it’s not like Haskell is the poster child for good documentation.
https://hackage.haskell.org/package/text-2.1/docs/Data-Text.html#g:24

3 Likes

I was once convinced of that, but I don’t find the argument that compelling when the core language features a type named string which seems to represent textual data, a type named char which seems to represent a “character” (whatever that is), and an indexing operation from string to char, that everyone uses. And no separate type for true text. Instead of falling into the trap that UTF-8 indexing is not constant-time, beginners will fall into the trap of manipulating bytes instead of meaningful characters, which I don’t think is preferable.

There is no dichotomy between byte-wise accesses and UTF-8 support You can have both. The issue I tried to point to is not so much the non-existence of String.get_utf8 : string -> int -> Uchat.t, as the general lack of visibility of UTF-8 manipulation, which receives much less highlight than byte fiddling.

IMHO It makes sense in the context of scripting, which is a valid use-case for OCaml too, and is close to “novice writing a little code as quickly as possible”, except not restricted to novices. Not everything has to be performance-heavy code nor responsive GUIs.

Lately I had to write moderately-sized text-processing programs. In all modesty, I don’t think I qualify as a novice programmer, and I am well-versed into the inner workings of UTF-8. My inputs are controlled (language-specific, so no arbitrary composing sequences and weird control characters and whatnot) and I do sanitize them (see: normalization). My strings are short (hardly longer than 30 code points). I have to run hundreds of regexes, substring searching, substitutions, slicings, ad-hoc tests… on every of them, at various stages of the program. I might have gone into a great effort of optimizing my program so that it traverses UTF-8 strings minimally, group regexes, keep track of dozens of indexes and flags in a spaghetti style. This is the style of code I would have tended to if all I had was these low-level primitives (in fact no: if I only had these primitives, I wouldn’t have written the said programs at all). All of that would have been a disproportionate amount of effort whose outcome would have been an absolutely ad-hoc, unreadable, unmaintainable code. Instead I sticked to the most literate programming style possible, which sometimes means string indexing. Of course I did it in Python; UTF-8 and regexes are among the reasons (there are unrelated reasons too).

It is not a real-time task. It needs to run twice a month. Speed is not crucial. It is slow anyway for other reasons. It is the kind of programs you let run in a corner while doing something else.

So, repeating my own words, it is handy.

I should also mention that string indexing in Python is in fact constant-time, because Python’s str uses a fixed-width encoding, if I read this correctly. It adjusts the character width depending on contained data (not that we want that implementation complexity in OCaml). And, since I never have code points greater than U+FFFF (which is true of most applications unless you care about fancy emojis :camel:), and most of my strings even fit in the Latin-1 range, it is in fact quite space-efficient as well.

I cannot help but remark that you (plural you) jumped on the matter of UTF-8 indexing and ignored the other lacks I pointed to in my previous message, which substantiate the claim that OCaml’s support for Unicode is improvable.

I’m not sure how the comparison with Go came into this discussion, but I don’t care that people apparently don’t complain about Go. I do complain about OCaml.

The Python implementation having a bug does not mean its interface is broken. And I don’t find “not being able to do anything” to be better than “very often being able to do the job, unless your need falls into the rare/artificial corner case.” It would arguably be, if the partial support meant a flawed interface that couldn’t be repaired later; but I’m yet to see an argument that having a separate type for Unicode strings is a design flaw.

3 Likes

It is very unfortunate that string is really a byte array and char is really a byte. This can definitely cause problems for beginners. However, the first sentence of the String doc is pretty clear:

A string s of length n is an indexable and immutable sequence of n bytes. For historical reasons these bytes are referred to as characters.

And the Unicode operations are clearly marked in the table of contents.

What else should be done to make the situation clear in the doc?

Separately, I’m sure there is a need for higher-level convenience APIs, such as easy ways to do case insensitive string comparisons. The uucp library seems quite good to me and I would gladly write my own higher level utilities based on it. But I suspect it does not meet the needs of everyone, particularly for cases where performance is not a big issue or a quick script is needed. For example, convenience APIs could make it easier to do what these examples do:
https://erratique.ch/software/uucp/doc/Uucp/Case/index.html#caseexamples

Is that the sort of thing you’d like to see?

On indexing strings, Rust gives us a good example to follow. It does not support indexing as such but rather provides a char (Unicode code point) iterator, and of course you can advance the iterator to the nth char. This gives you a way to get the nth char, while making it clear that it is scanning and not directly indexing. OCaml could have a String.to_uchar_seq function. Would having this be important for your application (if it were implemented in OCaml)?

2 Likes

In 1995, when Java came out, the basic type of String was a wrapper of a unicode-char array. In order to get adequate performance, for our web-app server we wrote an alternate tower of types for byte-strings – BString, BStringBuffer and others. We got significant speedup doing this.

(Unicode) string-indexing presupposes that you’re representing your unicode string as an array of unicode code-points, right? That’s wasteful of memory, isn’t it?

I guess one could argue that such performance issues don’t matter anymore. Maybe. But my experience in the noughties with e-commerce and web-app-based server-side applications (again in Java), is that the performance of string-copying, the memory-footprint of string data, is actually quite important in determining the performance of web-apps overall.

As many people have said, the OCaml libraries are not very easy to use for beginners. I gave up on OCaml once because of it myself. Now I’ve accepted that the OCaml way is to implement what you need and have fun doing it, and I like that! But that probably won’t work for many/most beginners.

But it doesn’t help just to identify this problem (we all know it’s a big problem right?) without saying what can be done about it. I can either contribute to writing new/better libraries, or not. What else can be done?

1 Like

A first step towards clearing the convention would be to introduce
better names. I think string is fine (it’s definitely the basic
storage for text!), but char could be deprecated (using the
appropriate [@@deprecated …] attribute, please dune don’t make it a
warn-error) in favor of a new byte type which is a lot more accurate.

Ahaha, who am I kidding.

2 Likes

Preface: this is just a suggestion; I could be completely off-base.

When I started working with server-side Java in 1995, there were few libraries available. One of the things that I remember back then, was that people wrote example applications, and those applications drove some development of libraries. Perhaps what’s needed here, is for people to propose applications in other languages (e.g. Python) that they claim are well-written, support unicode well, etc. And then to port those applications to OCaml, thereby exposing the places where things are painful, where they suck, etc? And then people could try to improve those spots.

The/A problem is that there really is a gap between the people who write applications and the people who write libraries and systems code. That gap has existed since nearly the beginning of computing, and it has always been difficult-to-bridge. So for an application-writer to bring an application to the systems-programmer, and to point out in the code where things are painful or suck, is really useful. B/c you can’t expect systems programmers to get the applications expertise to build real/realistic applications themselves.

I write this based on my experience with server-side Java apps in the noughties and teens. Sure, I got to see a lot of application code. But I ceased to write applications by around 1999, b/c … they just got too complicated for me to keep up, and besides I needed to spend my energy digging deeper into Java and server runtimes, not learning the latest Apache Struts framework.

P.S. Concretely, I mean to port as-realistic-as-possible applications from Python to OCaml. Porting a toy application doesn’t help. And obviously, that application will need to be unicode-enabled, which means that the app-writer has to work in multiple languages, etc.

4 Likes

We will keep having the disadvantage of the stdlib not being as flashy as other languages’. There’s nothing that can be done about that - our stdlib development process is too cumbersome and conservative, as it’s connected to the compiler itself, which takes up a lot of resources (and also needs to be conservative). If we had the manpower, I would recommend splitting off the stdlib from the core ocaml repo. Stdlibs need to move at a faster clip. However, we don’t have the manpower, so the main stdlib will remain limited.

So instead, beginners have to bridge that gap and learn to use other libraries. We have to look at the ecosystem as a whole and direct beginners to good solutions.

7 Likes

It’s not a bug, it’s interface is broken. It doesn’t expose the right model of Unicode text to end users. It exposes an encoding of Unicode with some hacks (AFAIU) to pretend it does not.

That in turn confuses programmers even more than they usually are by Unicode[1] and leads them to conclude that Unicode is broken while their programming language’s exposition of it is.

Eventually this leads to bad programs for end-users and no one likes unhappy end-users.

If you need normalization you have this sweet little library carefully kept up-to-date each year on every Unicode new release (yes that matters too, sometimes the language support of other language is outdated because it is in their standard library) and, for the last two years thanks to funding from the OCaml software foundation.

It may not be complete but what you can get here will already get you a long way as far as Unicode processing in OCaml is concerned. As I said OCaml :heart: Unicode.

Fair point. I’m not saying that the built-in OCaml support is as good as it could be. I’m just dispelling the myth that it’s bad or worse than any other language out there. Because that I’m really tired of it. Everyone has it’s own little sensitivity :–)


  1. They should not be but that’s another topic. ↩︎

11 Likes

char could be deprecated (using the
appropriate [@@deprecated …] attribute

Why not? This sounds like a good idea.

Another thing is, go produces static executables by default without seemingly any fuss at all — whereas we have seen in another recent thread that there are one zillion non-working ways to do this in OCaml, and one zillion more reasons why technically-in-the-know people seem to think this basic expectation is unreasonable.

So the big thing Go does is rid itself of any mandatory C. The runtime for Go is in Go. The system calls are done from scratch, bypassing libc. This gives Go a certain “sovereignty” that most other language implementations lack, and that makes it much easier to ship this stuff easily.

If you’re stuck with C, you’re going to be stuck with a deep long tail of miscellaneous devopsy problems. There are no ways around it like, only ways through it like Zig deciding to simply ship a bunch of C compilers as part of the download.

4 Likes

The reality is a little more complex than that. Firstly, for more than 10 years now, we have been able to make unikernels with OCaml, which implies:

  1. static compilation
  2. even cross-compilation in the case of MirageOS to depend on a kernel such as Solo5

This experimentation has recently been extended to support for Raspberry Pi 4 (of which there have been several experiments), ESP32 and more recently Cosmopolitan with the Esperanto project. It is even possible to run OCaml on microprocessors with omicrob (which involves a static cross-compilation and binary size optimisation phase).

In other words, it is possible to statically compile an OCaml executable (and ensure its reproducibility!). Cross-compiling is just as possible, even if you are still using cross-paths (which we hope will become a norm).

EDIT: for the posterity, I would like to mention DkML too for Windows!

5 Likes

Is it possible? Sure. Is it as easy as GOARCH=... GOOS=... go build ...? Nope. People have talked about this in other threads. OK, Go is reimplementing system calls so they don’t need C to cross-compile. OCaml needs a C compiler. OK. It seems like the next step to make progress on is making it easy to distribute a C compiler with OCaml. Jonah suggested CMake, a standard open source build and distribution tool which would apparently make cross-platform installation, from scratch, a one-command process.

I think there are ways to get there, but first we need the willpower to change our ways :slight_smile:

2 Likes

Doco good for references (I think about libraries) but lack of examples, of material with a gentler learning curve than odoc generated things.

1 Like

Yes, what I really wanted to say is that there’s a difference between: you can’t do it and you can do it with special means. This kind of feature implies so many things as far as the OCaml compiler is concerned that we shouldn’t polarise the debate by saying that there are a zillion methods that don’t work. That’s actually false and not very nice for those who spend time on it. Surely it’s a matter of aggregating what exists and talking to the right people to get there (and I know that’s a goal of the OCaml team).

2 Likes

It’s not at all false that there are a “zillion methods that don’t work”. If you google for this, you will indeed find a zillion methods that worked at one time, but do not actually work anymore; often the fault for some method no longer working is not belonging to OCaml, but rather to changes in operating systems, non-OCaml compilers (clang, etc.). The point, however, is that there is no place I can look and see a quick and reliable way to build a static executable. That much is true, and that is what I was claiming, which you have perceived as “polarizing the debate with false claims”.

This does not preclude that there is in fact a way that works, and I’m very glad to hear that there is, but the fact that nobody pointed this out in the other thread shows that this knowledge is very rare and that the method is probably not so easy to apply yet. I do not mean to polarize the debate. But what I am telling you is a starting point, for getting this great knowledge and wisdom that has accumulated in (e.g.) Mirage and making it accessible to ordinary users of OCaml.

You talk about talking to the right people to “get there”, but I thought by posting in the other thread, I was talking to the “right people”. I think the original poster of that thread probably assumed the same thing… Since no working solution was proposed there, it seems to me that the problem may be a bit more complex to solve than you say.

3 Likes