What are the biggest reasons newcomers give up on OCaml?

If you’re talking about separate signatures and structures, it’s not a convention; it’s a fundamental property of the language. And there are two kinds of types in OCaml. The bifurcation between signature (module type) and structure applies to modules, not “ordinary” types. So it is not true that “types and implementations live in separate files”. It is true that there may be some duplication of type declarations in the sig and struct parts of a module but that’s a completely different issue. You might want to take a look at Modules Matter Most.

It’s a cultural thing, and I think the OCaml culture is wrong there.

Annotating types is f-ing great and makes code a lot more readable. I
almost always annotate at least the return type now. I wish more people
would do the same.

11 Likes

grin

There’s an old joke about a FORTH programmmer who comes across some FORTH code he wrote a few years previously, and he doesn’t understand what it does. What’s worse, he reads the comment at the top of the page, and it says “this is code is so self-explanatory it doesn’t need a comment!” So he buckles down and skulls the code for hours, eventually

  1. he understands the code, what it does, etc
  2. and he concludes that YEP! it’s so self-explanatory that, YEP! it doesn’t need a comment!

Ha!

OK. Anyway, thing is, there are different sorts of programmers in the world. To this day, I don’t comment my code except for complex algorithms that are tricky and really need a comment for the algorithm. I never add comments for APIs/types/interfaces/functions. I’ve worked with code (of my own) that is 30yr old. And my invariable rule is: if I cannot figure it out by reading the code carefully, then that is my fault, not somebody else’s. The only concession I’ve made to this rule is that I try hard to write copious unit-tests, and those unit-tests are a form of documentation, b/c they explain how the code should be used, shouldn’t be used, and how it behaves.

But otherwise? I kinda feel like “Use the Source, Luke” is about right. Or as someone once said to me at Google: “Sometimes the best documentation is the code itself, no?”

I so much agree with the greatness of .mli for API design. I sometimes have to code in other languages (Python, Java, etc.) and it’s insane they don’t have this. Most of software engineering is about defining “interfaces”, and hide complexity behind those interfaces, and yet those language don’t have a way to define interface … Yes you can use doxygen and javadoc, but it miss the point of having the ability to focus while designing on just the interface.

9 Likes

I think we need to distinguish between people who are newcomers to programming from people who are experienced programmers but newcomers to OCaml.

For the latter, it’s been instructive watching where ThePrimeagen struggles. (He’s a tech influencer and very experienced programmer who is learning OCaml right now.) The same would be said of TJ (neovim dev) who is doing something similar.

It’s too easy for people who’ve been using the language for a long time to misjudge where the rough spots are. I think a few people on this forum have been popping into those streams to provide help. But from what I observe:

  1. Tutorial confusion between the different standard libraries, between adding stuff via opam and to the project with dune, and with different libraries for similar things not playing nice together.

  2. Learning resources written by academics for a classroom setting and not for professionals on their 3rd+ language who don’t need to learn basic CS concepts.

  3. Lack of libraries for many tasks, and no seemless way to access them. Lots of common things just don’t have a mature library. Unlike other languages, you can’t fall back to some other language on the same virtual machine. There are probably C++ libraries for what you need, but creating bindings is a chore. The same goes for calling into OCaml from something higher level. It is seamless to call between R and Python. It is easy to call from those into C, C++, Fortran, or Rust. Even Haskell has a solid set of R bindings that make it pretty painless. OCaml really only has the C FFI. And you need boilerplate on both sides to get things working well. As a result, unless there’s already an OCaml library for something, it’s hard to use it for a hobby project and thus those projects don’t get made and added to the ecosystem.

  4. Lack of advanced learning resources. Most of the tutorials we have stop at the basics. But where are resources teaching how and when to use GADTs, polymorphic variants, and the rest? If you search around like someone starting out, it seems like it would be very hard to learn the advanced parts of OCaml. But there are plenty of resources for learning unsafe rust or using relaxed memory models in C++. And unless something is a misfeature, then any sufficiently large project is going to run into a need for this stuff. Not having confidence that you could learn it when the time comes is an issue.

As for the other group, the students new to CS, students and universities need to just accept that no university course is ever going to teach them a language to the point that you could meaningfully put it on your resume. Being able to get code to compile educational toy programs isn’t knowing a language.

9 Likes

Without disagreeing with your point, I wanted to add that it’s actually not really complicated with ctypes to write C bindings. There’s no need for boilerplate on the C side. It’s even easier now that dune has some integrated support (tho I still have to copy paste the dune part from the doc, can’t write it up from scratch). real world ocaml and dune have relatively decent documentation on this topic. Ctypes also helps with reverse bindings (exposing some ocaml to be called as if it was a C library).

4 Likes

Without disagreeing with you here, I just wanted to point out that “the basics” (functions, algebraic datatypes, modules) can actually take you suprisingly far. I think that 90% to 95% percent of all the code that I see every day at work uses only the basics. And that is great, because using the more advanced parts of the language (GADTs, polymorphic variants, etc) comes with its own complexities which make reasoning about code using these features correspodingly harder.

Cheers,
Nicolas

5 Likes

Without disagreeing with you here (it’s starting to feel like a running gag :slight_smile: ), whenever that argument is put forward, it would be nice to give a specific list of missing items.
E.g. how do we distinguish between libraries that exist but are hard to find because the discovery mechanism is not good enough, and libraries that were never written in the first place.

We know opam has less entries than npm (I’d argue it’s a good thing!) but what are people really missing?

4 Likes

I think a lot of these qualifications miss my point. We are talking about “Why do newcomers give up on OCaml?”, not “How do people familiar with OCaml handle these things?”

“Can’t find a relevant library”. Sure, it would be nice if random people who looked at OCaml and decided against using it would drop in an tell us why. But that’s a huge ask. Good libraries drive language adoption. But the people who bounce off of OCaml due to a lack of libraries aren’t the people who will be around to talk about what is missing.

The best we can do is ask the people we do have for areas where they’d like to use more OCaml and currently don’t or can’t. But, realistically, we have no way of knowing how impactful that would be. We already have huge sampling bias as a result of asking people who regularly use OCaml to begin with. For example, I could give a lot of data science examples. But maybe we’d get more users if we had good GUI tools or support for real-time audio processing. No one here can possibly know what arbitrary people who aren’t using the language are working on.

“Learning resources for advanced features”. Yes 95% of the code doesn’t use them. But 5% does. And 100% of the people who are capable, advanced users of the language will need to know them because in a large enough project, sooner or later you will want to consider them. I know from looking at the Rust website that unsafe features and manual lifetime annotations are advanced things. But I also know when I’d want to invest in learning them and how I’d go about that. Where do I even point someone who wants to learn the advanced parts of OCaml in any real depth? We need obviously available intermediate and advanced learning materials to give someone confidence that they can become an intermediate or advanced user of the language in time. And similarly, we need introductory material that’s targeted at mature programmers instead of college students. (And as great as Real World OCaml is, the treatment of this stuff is far too short to do the job alone.)

“Language bindings” – it isn’t that complicated if I already know what I’m doing and already have a library that maps well and happens to be in C. But if I have some arbitrary C++ legacy code that was only ever intended to be used by C++ applications, it’s far from automatic, and nowhere near as easy as what’s available elsewhere. In R for example, there’s Rcpp and reticulate. (For calling C++ or Python respectively). OCaml doesn’t have libraries for everything, but if using other ecosystems was actually seemless, it would matter less. But “not that complicated” and “automagical in almost all cases” are worlds apart. For a sufficiently large project, this doesn’t matter as much. But for a small project like the kind of stuff that would get a newcomer off the ground, having to go through extra steps or write boilerplate that’s a substantial portion of the code is one more point of friction.

These are all examples of friction – extra steps and decision points that each cause some fraction of people considering the language to bounce. This same phenomena is why marketing people are so obsessed with minimizing the number of clicks it takes for someone to buy a thing or sign up for something.

If we want more people to use the language, we need to make it easier to use it for more random small projects. A good way to test this would be to take some Perl or Python “throw away” code and try to do an equivalent in OCaml. Is it as easy? If not, that’s a potential point for improvement.

4 Likes

Just saw this thread for the first time, but I definitely think that the complexity of the tooling vis-a-vis things like go and rust/cargo is a huge difference that probably accounts for a lot of beginners (as well as experienced people) jumping ship. Another thing is, go produces static executables by default without seemingly any fuss at all — whereas we have seen in another recent thread that there are one zillion non-working ways to do this in OCaml, and one zillion more reasons why technically-in-the-know people seem to think this basic expectation is unreasonable. I understand that there are technical challenges and trade-offs to all these things, but I think there is a pattern of where some desirable property of tooling sounds totally impossible or unworkable or unreasonable in the OCaml community, but is just totally fine and unmentioned (because it is so basic) elsewhere.

4 Likes

Without disagreeing with you here (it’s starting to feel like a running gag :slight_smile: ), whenever that argument is put forward, it would be nice to give a specific list of missing items.

Some examples:

  • JSON: use yojson, jsonm, ezjsonm, ezjsonm?
  • Use lwt or async?
  • Use result or rresult?
  • Use dune or not? Some high-profile packages are not using dune.
  • Try using an http client library
  • Real World OCaml is often recommended as a book but it is heavily biased towards Jane Street Libraries which are their own ecosystem - something beginners don’t realise
  • How to deal with unicode and UTF8? It’s not explicit

These are just a few examples. If you look at other languages (Go, Gleam, Vlang) many of them have a standard library that makes these decisions for you. You are still free to use a more advanced HTTP library but you are not facing these decisions right at the beginning.

6 Likes

From Rresult / Erratique or opam info rresult:

OCaml 4.08 provides the Stdlib.Result module which you should prefer to Rresult

Regarding this:

Please stop propagating the myth. Since OCaml 4.14.0 we have both UTF encoders and decoders in the Stdlib, this is not different from level of Unicode support you get from Go (and yet we never hear anyone complain about the Unicode support of Go).

OCaml 4.14.0 was released March 2022. UTF-8 was invented 1992.

1 Like

Not sure what your point is here. An OCaml newcomer will likely use the latest version of the compiler so:

That’s not happening if you are just seeking Unicode support that is on par with the one provided by Go.

My point is that OCaml’s standard library is weak (and has a confusing toolchain) and as such is contributing to actual and perceived problems. For most people a language is a means to an end to solve a problem and many languages promise a more direct line to that goal. If you are intimately familiar with OCaml and how its ecosystem evolved you might not see that problem or, convinced by its other strengths, willing to accept it. Beginners are not in the same position and will give up if they find that other languages get them off the ground quicker.

2 Likes

I think calling that a “myth” is an over-statement. Even though we now have UTF-8 encoders and decoders in the library (thank you for that!), that’s very minimal, and the interface is very low-level. I don’t know Go, but we are nowhere near Python in terms of native support for Unicode. As in C, decades ago, the unaware beginner will get it wrong and manipulate bytes instead of characters/code points. The type of bytes is misleadingly named char and is used pervasively; byte-wise indexing has blessed syntax, and every learning resource would spread the confusion (the fact that UTF-8 codecs were only added last year, in a language with two decades of existence, means that folklore and learning resources haven’t integrated them yet). No convenient UTF-8 indexing operator (which of course you should only do consciously in a serious and performance-concerned application, but it is so handy). No pretty-printing of Uchar.t. No syntax for Uchar.t literals (I think?). Which also means no pattern-matching of code points against literals. No support for UTF-8 in Str nor in the go-to library for regular expressions (re).

Edit: no normalization.

BTW, on the topic of incomplete or confusing set of libraries, add regexes to the list; we do have a regex library in the standard library (Str) but it is limited and everyone says you should not use it.

3 Likes

From my point of view, the confusion may come more from our ability to hide the details you mention rather than offering a minimal API that is transparent about what’s going on.

It’s always difficult to find a viable cursor between the apparent simplicity of a system and the interest in presenting the implementation details that developers need to understand and the implications that this may have for their applications.

The example of the index on the byte or on the UTF-8 code-point is an example that can be fatal in terms of performance when it comes to working on a user interface. Do we want to let developers not ask themselves this question?

The value of such ‘helpers’ is also very limited. If it’s just a matter of a novice being able to write a little code as quickly as possible, there’s really no point. Someone with experience wouldn’t use these functions for a whole host of legitimate reasons.

If we’re talking about learning when it comes to these issues, it’s actually better to grasp what UTF-8 really is (in all its complexity), learn to use it with all the implications that have been explained to us (through usage or documentation) and finally not have to use such “novice” functions since we’re better able to know what we really need according to our objectives.

As chance would have it I’ve done just that recently. My daughter needs to apply to some sixth-form (17/18 year old, UK) colleges and I tinkered with a little web-frontend to some downloaded data. I’m just experienced enough to dig my way out of the little problems I encountered, but not so experienced that I don’t make them.

I tried out dream-html because I saw the author had mentioned it recently in this forum. The main package page dream-html 1.2.0 (latest) · OCaml Package says there aren’t any docs (as of 2023-09-05) whereas if you click through to GitHub - yawaramin/dream-html: Generate HTML markup from your OCaml Dream backend server then you get a nice README with examples and everything.

HTML tags in dream-html are basically used as tagname [list of attrs] [list of child tags] with some suitable type constraints to catch basic errors. That all worked as well as you could ask until I had to do something a little unusual.

textarea [ id "notescontrol"; name "notes" ] "%s" c.notes

You’ll note that there isn’t a list after it - there is a just a format-string and value because textarea doesn’t have nested tags. Took me a few goes to get that though. If you get that wrong and put a a list in then you get this.

File "bin/main.ml", line 292, characters 53-70:
292 |       textarea [ id "notescontrol"; name "notes" ] [ txt "%s" c.notes];
                                                           ^^^^^^^^^^^^^^^^^
Error: This variant expression is expected to have type
         ('a, unit, string, node) format4
       There is no constructor :: within type format6

Now, I’ve seen errors about “formatX” before and know it’s because of some fairly complex stuff that must be going on to allow format strings to be type-checked. But - that error does not say “you put a list here and we don’t want that” - not unless you already know that is what it means.

As luck would have it, the library not only has docs but also an understandable test-file and is (to my beginner eyes) clearly written.

let textarea attrs fmt = text_tag "textarea" attrs fmt

Ah, of course! It doesn’t want a list of nodes for a textarea - just a string literal or a format-string and value.

292 |       textarea [ id "notescontrol"; name "notes" ] ("%s" c.notes);
                                                          ^^^^
Error: This expression has type string
       This is not a function; it cannot be applied.

I actually did this. I was misled by the fmt and thought “single thing - I’ll bracket the format+value to indicate they go together”. Here I run out of instincts that can help me. The fmt must be getting expanded inline in the list of arguments but it isn’t its own thing even though it has a name.

Interestingly, in vim if I correct the error and then highlight "%s" c.notes and look up its type it gives me the same error-text.

So - the issue here I think is that someone with 20+ years of experience can very quickly fall off a cliff with something that should be trivial.

The second significant issue I had was with caqti. I have used it before (very minimally) so can copy and paste working examples for some basic queries. I’d not done an update before though, and knew I’d have to change the infix used. For example, the multi-row operator is ->* and the exactly-one-row is ->!. So, I type “ocaml caqti” into google and get the github repo as the first result - the README doesn’t seem to cover this and doesn’t link to anything that does. I’ve seen these operators listed somewhere though. Second result is opam - caqti which I realise is the old package list. OK, go to ocaml.org and search from there. “No Docs” for the current version of the package, select a slightly older one and that has a docs button.

Except… no actual docs (AFAICT) - mentions dependencies on things like “ptime” but just lists two top-level libraries. Now I knew I’d seen lots of docs for caqti, but they just don’t seem to be linked from anywhere.

Eventually I managed to get to Infix (caqti.Caqti_request.Infix) but I still have no idea how to actually navigate to that by following some top-level links. The “Caqti API Reference” from the github page must lead there somehow.

So - I don’t know exactly what isn’t right with docs searchability but it wasn’t bringing me much luck.

And finally, this is what dune fmt does to my database-interface code.

let db_college =
  let encode
      {
        urn;
        laname;
        schname;
        postcode;
        schstatus;
        schooltype;
        agelow;
        agehigh;
        gender;
        relchar;
        all_he;
        all_russell;
        all_oxbridge;
        home_postcode;
        mins_from_home_postcode;
        rating;
        notes;
      } =
    Ok
      ( urn,
        ( laname,
          ( schname,
            ( postcode,
              ( schstatus,
                ( schooltype,
                  ( agelow,
                    ( agehigh,
                      ( gender,
                        ( relchar,
                          ( all_he,
                            ( all_russell,
                              ( all_oxbridge,
                                ( home_postcode,
                                  (mins_from_home_postcode, (rating, notes)) )
                              ) ) ) ) ) ) ) ) ) ) ) ) )

and

  let _query =
    ((int & string & string & string) ->. unit)
    @@ "INSERT INTO college_notes\n\
       \        VALUES (?, ?, '?', ?)\n\
       \        ON CONFLICT (urn, home_pc)\n\
       \        DO UPDATE SET notes=?\n\
       \        WHERE urn=excluded.urn AND home_pc=excluded.home_pc"
  in

Marching off the right-hand side of the page is the sort of thing I expect from a corner-case with a code formatter. But mangling my queries and rendering them un-copy-paste-able because I dared to let them extend over more than one line? That’s not much fun.

So:

  1. You fall off a cliff-edge when you hit a slightly-more-complex error.
  2. When docs are there, they don’t seem to be easily findable.
  3. There is a difference between reformatting code and messing with my string literals and I don’t appreciate the second one.

None of this is to do with the complexities of handling lwt + caqti themselves and how that interacts with dream as the web backend. That’s no fun, but I understand what it is there for even if I don’t have any valid instincts for it yet. These three are just zero-benefit things. They aren’t anything to do with the program I was writing - just useless bits I had to get right to get the program to run.

15 Likes

My feelings too. For anyone to survive in the ocaml landscape, and I use the word “survive” without exaggeration, you have to really persist.

You have to read actual code for docs, you have to carefully comb the actual docs, and then you have to use this forum or discord. Gpt4 has slightly alleviated some of my pain.

For example, like your caqti query, I was able to ask it to generate one from my sql, and add my custom encoders and decoders. Took many tries but it was certainly faster than fiddling over just .mli type signatures. By then end of it all, I’m just happy I was successful in building my experiments. So there’s motivation for the future. But I also get too exhausted to contribute back other than just bits of helpful gists and GitHub issues. This is the journey that I think most newcomers has to endure (if coming from and using ocaml for web development).

Reminds me of this essay. The looming demise of the 10x developer: Why an era of enthusiast programmers is coming to an end

Their claim is that the majority of the new wave of developers are less likely to be super passionate about programming. Passion equates to grit and survival. That’s a fair assessment. We sometimes just want to get paid and go home too.

Helpful conclusions? If we want newcomers to join, yes we have to compare ocaml’s get-started and typical get-work-done flows to python, java, c#. @sabine @tmattio and co. have seriously leveled up the get-started flow. To help others get shit done, write and share more, whether that’s to open source more, share all your ocaml examples regardless of status (toy, production).

5 Likes

Oh no it is a myth.

Almost every newcomer blog post out there mentions the terrible Unicode support in OCaml while this support is neither better nor worse than Go’s one. But somehow I never read that in blog posts about Go. The prejudice is passed around without much thinking from people whose understanding of Unicode does not seem to go beyond some kind of label on a tin.

Last time I looked into python Unicode support, it didn’t look good I’m afraid. As always better no support than broken support.

So I’m not afraid to say that nowadays OCaml has outstanding Unicode support. Here’s the new myth to propagate since apparently everything is advertisement nowadays. OCaml :heart: Unicode.

I have had to repeat so many times how indexing unicode scalar values is mostly unuseful that I won’t bother repeating it here.

And it should be added that if I had tried to push to integrate the uutf Unicode character folders 11 years ago or this design 5 years ago you would have had terribly inefficient or terribly designed decoders in the Stdlib.

Trust me, it was worth the wait you can’t imagine how many good properties are packed in the design that was upstreamed. You get both foolproof standard and controllable decoding erroring strategy with an efficient, unboxed and exceptionless design.

I don’t know why it took my brain so long to find it but I’m glad I waited for the right design before making the proposal. Remember the Stdlib is mostly append only.

16 Likes