we’re adding an “OCaml Cookbook” feature to OCaml.org where the community can share recipes (small code snippets) for common tasks. These recipes can use packages from the OCaml ecosystem.
I am tempted to take inspiration in spirit primarily from the Rust Cookbook (Table of Contents - Rust Cookbook), since this is focused entirely on tasks that may be required for shipping applications to production.
However, people also mentioned that they like the format of Go by Example (Go by Example: Sorting) a lot.
The current prototype of the cookbook on OCaml Cookbook combines both of these ideas.
The final version may be simpler - for example, we might find that no recipes need more than a single file, in which case we’ll remove the capability to have multiple files on a recipe.
Also, I will definitely need your help in completing the list of tasks and creating all those recipes. You can make suggestions on the PR to add tasks.
Mention “Stdlib” with the relevant link if that is the only dependency perhaps?
The output from utop would also be good to have.
Unless you made them actual files and runnable with ocaml <file>?
Yes, long-term all of these code blocks should be runnable, however, we’re going to need support for loading packages in the playground for that first (for all the recipes that need more than stdlib).
Is the index (caqti_ppx_rapper.index) adequate for the Database Cookbook section (if I remove Base/Core dependency I guess) ? Or perhaps a guide/tutorial ?
EDIT: I have published it on Guides/Guides… but it is not clear what should be published where. I guess that the Cookbook should have pointers to adequate guides. One of the cookbook item is Command lines… and there is already a guide aboute it.
I guess that with some sections, the choice of the library could be a point to be decided (or we can divide a section in two if two sets of different libraries are used). Web programming heavily depends of the library too.
Is the index (caqti_ppx_rapper.index) adequate for the Database Cookbook section (if I remove Base/Core dependency I guess) ? Or perhaps a guide/tutorial ?
Hey Frederic, this is awesome and I see you opened a PR on OCaml.org already for a database guide.
One of the cookbook item is Command lines… and there is already a guide aboute it.
In general, we will need to sort things and see what goes where.
As I see it, the Cookbook is intended as an entry point that gives short answers centered around code examples and exposing the surface level semantics of relevant packages when faced with a very concrete task. The intent is to get people hacking on things quickly.
In contrast, guides can elaborate more deeply on how things function under the hood. However, I think in many cases these guides should be provided in the documentation of the relevant packages, and not necessarily on OCaml.org. This is related to the maintenance overhead associated with keeping these guides in a good shape.
As I see it, at this point, we need to make a comprehensive list of tasks we want to showcase in the cookbook (i.e. go over the list of tasks in the Rust cookbook or other relevant resources, and see which of those we want to ask people to help add), then merge the cookbook, open an issue, and then people can contribute recipes by branching from the main branch.
Making a list of tasks we want recipes for is important because the less “noise” we have in that list, i.e. the higher the density of extremely practical snippets, the more useful the Cookbook will be.
The issue about the guides maintenance would be the same with the cookbook. Let’s take the Random number generation (mirage-crypto-rng) which happens to already have some cookbook snippets.
On version 0.10.7, Mirage_crypto_rng_unix.initialize has type unit -> unit. And in 0.11.2 version, ?g:'a -> 'a Mirage_crypto_rng.generator -> unit. (Furtunately, the cookbook is up to date)
About the database guide, it was here to fill a gap. I mean the ppx_rapper describes very well the preprocessor (in a more comprehensive way than I have done), but we have to figure out how to feed it with a dbh which should match the type Rapper_helper.CONNECTION (I had to figure out that a Lwt Caqti connection is ok…). The exemple files help, but were designed with one query… How to chain multiple queries ? (the exemple is not well designed for it) And what about error handling where a Lwt_result monad seems to be the right tool. TL;DR, the guide is almost about the integration of 3 libraries (ppx_rapper, caqti, lwt_result)…
I will try to filter the guide into database snippets in a mind more focused on the task. But now, I am waiting for some directions…
I have a some snippet, but don’t know where to send the pull request (cookbook or staging branch ? cookbook seems more adequate).
EDIT: I have a (text) encoding article. But I have an issue… On Windows (opam.2.2.0~beta1 and its recommanded - sunset - repository), I have to use Camomile.1.0.2 where on Linux, I have 2.0.0… There is one initialisation line which differ (needed on 1.0.2, musn’t be typed in 2.0.0). An other difference is that camomile.2.0.0 uses dune-site which can’t be loaded by utop. camomile.1.0.2 is fine.
It can be interesting, but the idea to generate an SQL statement with a sprintf is really a bad idea when we can’t trust the input variable. Its Dbi module (ocamldbi package) seems promising but obsolete (ocamldbi → ocaml < 4.06.0)
About the Web, I wasn’t able to apply its mod_netcgi approach… and happily switch to dream.
Would be a wonderful thing. I think your primary challenge is not getting tasks/recipes but simple discoverability. All the recipes in the world won’t help me if I can’t find the one I need. And if my task doesn’t fit the task descriptions in the cookbook…
Here’s an example from pleac “Modifying a File in Place with Temporary File”.
Suppose that’s what you want to do. What are the chances you would come up with the right search string for the web? Pretty close to zero imho. Of course that’s a toy example, but in my experience it is often near-impossible to come up with a search phrase that expresses what I need.
I think the really critical thing is good old fashioned indexing. It’s a PITA but worth it .
Its a sad thing that tech largely ignores traditional library science. Google “library science discoverability”.
I guess we have the Awesome OCaml site and probably some other could help the discoverability and some links would be welcome. I am trying to develop some cookbook items… the filesystem is quite close to the reference manual which is easy to apply to a given task. Text encoding ? I would have liked a Cookbook about Camomile (it is the issue with functor based libraries, some examples are always welcomed). Idem for Mirage Crypto (some generic function are happy with a `RSA key, and some others just a key.
Just a heads up that work on the cookbook will continue next week, thanks for the useful feedback!
Thanks to @Frederic_Loyer’s contributions, and @R_Huxton’s feedback, we have a much better understanding now how we need to structure the content.
And, as @mobileink points out, indeed, indexing this via vector search would be great - I hope we can do something along these lines for the entire search infrastructure on ocaml.org later this year.
The visual design is not yet final, but it works. It is organized in recipes, tasks and categories.
A task is something that needs to be done inside a project. A recipe is a code sample and explanation of how to perform a task using a combination of packages. Some tasks can be performed using different combination of libraries, each is a different recipe. Categories are groups of tasks or categories
You’ll see most tasks don’t have any recipes. We hope to collect the best recipes. Categories are also open for discussion.
I am a bit upset… all of my contributions seem to have disappeared.
SQLite CREATE, INSERT,SELECT ? I guess I had two contributions which matched (one with a dedicated SQLite library, and one with the Caqti/ppx_rapper combo), and both have disappeared.
NOTE: the proposed outline separates database. It is quite normal with packages dedicated to one database, but with Caqti or Patrol which are multi-database, it is not adaquate. The choice is not driven by the database, but by the level of programming (Caqti/ppx_rapper: decorated SQL statements, Petrol: near ORM level)
UTF-8 processing ? I had one proposition with Camomile.
HTTP Get requests ? An other one.
Parse a URL from string and access individual parts ? An other one (which covers also create a URL/URI from parts… an other recipe idea ?)
Sorting Lists and arrays too
(Just to name a few)
Then I am quite puzzled. What should I do and avoid making an ephemeral artwork ?
Not a comment about what happened to your contributions but note that for:
There’s absolutely no need to use Camomile. UTF-8 is fully supported by the standard library nowadays. Encoding occurs in Buffer and Bytes and decoding in Bytes and String.
The only problem is it takes a Unicode expert with the willpower of a saint to make heads or tails out of these modules’ docs. With eg uuseg, with some head scratching I can at least figure out how to fold over a string’s grapheme clusters to calculate the length. But with the standard library devoid of examples, it’s no surprise that people just ignore it.
… which makes it perfect for an example in the cookbook.
(That being said I highly disagree with your assessment, invoke String.get_utf_8_uchar and follow the types. Regarding uuseg I honestly don’t see what there is to head scratch here).
OK, I’ll bite…how do I calculate the length of this string using only the standard library?
let facepalm = "🤦♂️"
The standard library only talks about Unicode characters. It says nothing about grapheme clusters. Let’s say I want to use String.get_utf_8_uchar, get a utf_decode, get the length of the decode, move the cursor forward by that many bytes in the source string, then repeat for each decoded Unicode character:
# let rec ulen ~off ~len str =
let dec = String.get_utf_8_uchar str off in
if Uchar.utf_decode_is_valid dec then
ulen ~off:(off + Uchar.utf_decode_length dec) ~len:(succ len) str
else len;;
val ulen : off:int -> len:int -> string -> int = <fun>
# let ulen str = ulen ~off:0 ~len:0 str;;
val ulen : string -> int = <fun>
# ulen "🤦♂️";;
Exception: Invalid_argument "index out of bounds".
Whoops! It’s not as simple as it might seem. I guess we need to handle the exception where we go past the end of the string?
# let rec ulen ~off ~len str =
match String.get_utf_8_uchar str off with
| dec ->
if Uchar.utf_decode_is_valid dec then
ulen ~off:(off + Uchar.utf_decode_length dec) ~len:(succ len) str
else len
| exception Invalid_argument _ -> len;;
val ulen : off:int -> len:int -> string -> int = <fun>
# let ulen str = ulen ~off:0 ~len:0 str;;
val ulen : string -> int = <fun>
# ulen "🤦♂️"
- : int = 4
But that’s not correct either. This string contains one grapheme cluster, and it should give me that length:
# #require "uuseg.string";;
# Uuseg_string.fold_utf_8 `Grapheme_cluster (fun len _ -> len + 1) 0 "🤦🏼♂️";;
- : int = 1
What do you mean by string length ? This could be many things.
Note that even grapheme clusters counts will not do it for layout computations. Many other things will happen at your text rendering layer.
Correct here is whatever your brain imagined. Unfortunately things are not that simple and there are many notions of correct in this setting. Text processing is always more subtle than people want it to be.
And ? If you are interested in grapheme clusters then you know it’s not the place.
Please.
First Invalid_argument is never meant to be caught, it means programming error.
Frankly it doesn’t seem too difficult to understand that given an index i and a length l, then indexing the string at the index i + l >= String.length s will result in an out of bounds error.
I mean the thing that reading my entire message makes quite clear, that is the grapheme cluster count, which should be 1 for my example.
Note that even grapheme clusters counts will not do it for layout computations.
Cool, but as I said, my specific example is about the grapheme cluster count. And after reading my entire message, if someone wants to pretend it’s not clear what I’m talking about, it’s on them.
If you are interested in grapheme clusters then you know it’s not the place.
Yes, I do know that, which is why I was surprised when you said ‘UTF-8 is fully supported by the standard library’. Because when people hear that, call me crazy but they will expect to be able to get the grapheme cluster count of a string.
maybe it’s time for you to find a basic programming course or a mentor.
Well, I am so sorry that my quick exploration of the standard library’s Unicode functions does not live up to your standards and that I am apparently not fit to be a programmer for not handling an out of bounds error (one of the most common sources of bugs…) properly.
I think there was a communication failure (lack of communication from our side) here.
We merged some contributions prematurely before it was really clear what the format of the cookbook is and what criteria should be used when reviewing. Your contributions will be restored into PRs so a proper review can happen.
Maybe we should split this into a different thread, but:
They shouldn’t, UTF-8 means the encoding, not all of Unicode. Also, I’m noticing a big cargo cult of “grapheme clusters” as the smallest unit of Unicode text when that’s very rarely what one wants, and supporting them adds a lot of cross-cutting concerns.
For example, what do you mean by “grapheme cluster”? The default extended boundaries? The Unicode annex states that implementations “can and should” tailor the defaults (emphasis theirs). How do we add tailorings? How do we test our tailorings are compatible with our word/sentence/line boundaries? Does the boundary API support random access, and if it does, can we limit the bounds to search for a safe start? And finally, which version of the Unicode property tables will your app support, considering they can carry breaking changes, and why should it depend on your version of OCaml/ICU/uucp (pick your poison)?
The purpose of the segmentation annex is providing defaults, meant to be tailored, to find boundaries of sentences, words and grapheme clusters within a larger text for some given purpose; grapheme clusters here are not special over other kinds of boundaries and aren’t meant to redefine the smallest unit of Unicode strings.