Using OCaml for scripting with non-core libraries and editing support

arvidj · September 27, 2024, 7:14pm

Hi! I’d like to use OCaml for scripting in situation where I’d typically reach for a shell script otherwise. I’d like to avoid writing any dune-file boiler plate.

I’ve figured out that I can run .ml files directly through the ocaml toplevel (ocaml my_script.ml). However, you quickly run into situations where you need non-core libraries (unix, re, …). For this, I’ve understood that you can use the top-level directives to load ocamlfind (which can be installed by e.g. opam) which in turn adds the #require directive with which you can load libraries in a portable manner:

(* Load the [#require] directive from topfind aka
   findlib aka ocamlfind: *)
#use "topfind"

(* Load the [re] library: *)
#require "re"

(* Load the [unix] library: *)
#require "unix"

let () = (* [Re] and [Unix] are now available ... *)
         ...

The disadvantage now is that, as reported here and here, this file can now longer be analyzed with merlin, which reports Syntax error on the top-level directives used the file above. This makes editing such scripts quite a pain since you no longer have all the conveniences enjoyed through merlin.

A workaround is posted in one of the issues linked above. Instead of using the top-level directives, their definitions can be in-lined in the script:

(* Equivalent to [#use "topfind"] *)
let () = Topdirs.dir_use Format.std_formatter "topfind"

(* Equivalent to [#require "re"] and [#require "unix"]: *)
let () = Topfind.load_deeply [ "re"; "unix" ]

let () = (* [Re] and [Unix] are now available as above ... *)
         ...

(I think this might require installing ocaml-compiler-libs.toplevel first though.)

At this point, you only have to jump through a final hurdle to make the file analyzable by merlin. Merlin can’t figure out on it’s own that the re and unix libraries are used in the script. For this, we can add a PKG directive to a .merlin in the same folder as the script:

PKG findlib ocaml-compiler-libs.toplevel unix re

In conclusion,

by using the inlined definitions of the top-level directives #use to load topfind, and then #require to load non-core libraries;
by instructing merlin about the of the non-core libraries used through a PKG directives in the .merlin file,

we can now run the script as such ocaml my_script.ml and edit it directly in e.g. emacs with IDE-capacities as provded by merlin.

Now, to my question: are there any easier ways achieve the above: single-file scripts, easy to run without compilation and merlin-powered editing?

I know about:

but I haven’t yet had the time to figure out how they can help me.

Frederic_Loyer · September 27, 2024, 9:33pm

With ocamlscript, there is a merlin compatible example:

github.com

ocaml-community/ocamlscript/blob/master/examples/no-merlin-errors.ml

#!/usr/bin/env ocamlscript
let open Ocamlscript.Std in (** Inclues Ocamlscript and special (--) operator *)
begin 
  Ocaml.packs := ["cmdliner"]
end
-- 
(* ^^^ opened as infix operator here, returning a unit.
 * Must be on its own line! *)
() (* need to close out the -- operator *)

let f x y = x + 1 (* can parse basic staements after the () *)

let arg_info = Cmdliner.Arg.info (* ensure packs are present *)

let () = print_endline "look ma, no merlin errors!"

In the preambule, only Ocaml.packs := [ … ] is required to load modules. The let open Ocamlscript.Std and the () should be for helping merlin. EDIT: but I couldn’t make merlin happy (it doesn’t find my modules, Ocamlscript.Std included !)

Note, it is not « without compilation », but with an implicit compilation. The first run after editing compile it, the next one compare the date and launch directly the executable in the same directory (like Python with .py and .pyc).

With b0script, there is the sale kind of syntax (# directives), then we can read

We hope to eventually convince ocamlmerlin to understand #directory directives and abide by OCAMLPATH the way b0caml does. This will have merlin work out of the box in your script without having to specify anything. If you are using #mod_use you will be punished accordingly.

jbeckford · September 27, 2024, 10:07pm

(I’m the author of DkCoder)

DkCoder is project-based today, unlike the two others. DkCoder needs a directory structure so it can build artifacts necessary for ocamllsp. You can place a single script nto the directory structure, but at the root of the directory structure are three bootstrapping files (very much like Gradle wrapper). Don’t overlook that even in your example you are writing a .merlin file, which presupposes a project with at least two files.

The second difference is that DkCoder is a binary distribution. That is a strength and weakness. It does mean that a lot of functionality (Re, Tiny_http, Cohttp_curl, etc.) is available and has been compiled to work well on desktops (SSL, graphics). It also means that you can’t just add in your own OCaml packages.

For unrelated reasons I’ll be adding an option to create Merlin files very soon. But it isn’t needed for ocamllsp support because under the hood DkCoder uses Dune as its compiler.

bhoot · September 28, 2024, 12:01pm

Wow. Just a couple of weeks ago, I was on the same track – check the viability of OCaml for single-file scripting (i.e., specify dependencies within the script). I documented my process partially: poc-json-to-atom-feed/README.md at main · jbhoot/poc-json-to-atom-feed · GitHub

The exploration space in my case was:

I took on this task to answer a question - which statically typed language can fill that niche of single-file scripting which lies between too-large-for-bash and still-too-small-to-be-a-program? Different aspected were taken into consideration - single-file, which meant in-file dependencies; fast feedback loop; built-in watcher facility; execution time; compilation time; and others.

I stopped before you did, though.

Compared to other candidates - Scala, Go - OCaml fell behind in terms of editor tooling. Scala’s CLI, in contrast, even installs the specified inline dependencies in the first run, has a built-in watcher.

I too haven’t been able to see how b0caml or ocamlscript can simplify this process. Though dbuenzli makes it quite clear in b0caml’s docs:

Write programs not scripts.

DkCoder looks like a toolkit.

dbuenzli · September 28, 2024, 12:29pm

For now I’d advise not to use b0caml, it was written with assumption that a simple directory-based notion library would make it upstream. But that never happened, so it likely needs adjustements to be more compatible the dreadful status quo.

I think the best way is to do what what you suggest: use ocamlfind. The advantage is that you will also find the ocaml/ocamlfind combo in the system packages of many linux distributions. ocaml could have been an excellent system “scripting” language but, as usual, totally missed on providing the integration mecanisms and experience that would have been needed to do so.

Also if you don’t compile them like b0caml does you may want still want to check that your scripts compile correctly for which you can use this hack and of course the merlin issue remains. Yes all this is clunky beyond sadness but people mostly don’t care about things that do not fit the narrow minded dune narrative nowadays.

dbuenzli · September 28, 2024, 1:23pm

There are two orthogonal components in b0caml

What are the meta-linguistic needs to make it convenient to reach for OCaml when you want to write a “script”.
Transparent compilation of your “script” to a native executable.

The result of point 1. which could have been upstreamed at some point is here in this system. Now as bonus of point 1. you get dependency tracking and queries for your script (b0ocaml deps).

So if have a good scoped library integration at the system level boostrapped from an ocaml system package install and a simple mecanism to install libraries in /usr/local/ or .local prefixes using a non-system package manager it becomes a breeze to ship a script to a machine and setup the environment in order to execute it (think opam install $(b0caml deps myscript.ml) though that exact invocation wouldn’t work for all sorts of reasons :–).

I’m not sure you got my point here. My point is that I don’t see the difference between programs and scripts, except that in practice the latter tends to be brittle duct taped programs (mainly due the linguistic disasters that sh-based derivated languages are). Lot’s of “scripts” turn quickly into full-fledged programs running critical bits of infrastructure. So one of the aim of b0caml was also to streamline or blur the steps when you need to turn your “script” into a “program” or vice-versa.

dbuenzli · September 28, 2024, 1:58pm

I forgot to add, that to workaround the lack of good system integration you can simply use opam. Make a special carefully currated switch on your system, say scripts for the packages needed by your scripts and use this shebang line in your scripts:

#!/usr/bin/env -S opam exec --switch=scripts -- ocaml

Here’s an example of a github synchronisation webhook using this technique.

jbeckford · September 28, 2024, 11:23pm

I rarely see that word (“toolkit”) used outside of auto mechanics, and the only thing that sprung to mind was Tcl/Tk. And since I have never used it, I had to spend some time seeing what Tcl/Tk did. Is that what you meant by “toolkit”? If so, DkCoder is very much a scripting toolkit. (Perhaps it should have been named Ml/Dk?)

bhoot · September 29, 2024, 3:53am

I took the name from the Scala Toolkit, which pre-packages a few utility libraries for common programming tasks.

bhoot · September 29, 2024, 4:02am

Oh, when I quoted your assertion, I had in mind what you explained:

I don’t see the difference between programs and scripts, except that in practice the latter tends to be brittle duct taped programs

Lot’s of “scripts” turn quickly into full-fledged programs running critical bits of infrastructure.

Its great that b0caml aims to make the transition from script to program seamless. It should be an integral attribute for a script.

#!/usr/bin/env -S opam exec --switch=scripts – ocaml

Great tip. Thank you!

jbeckford · September 30, 2024, 7:34pm

Sigh. I have done a poor job of describing it then. I’ll fix up the intros and descriptions as I do more releases. For now, DkCoder is primarily a build system that treats .ml files as scripts. And secondarily, because it repurposes OCaml module names, it knows which libraries are needed for your scripts with little or no configuration. It will be obvious when I release DkCoder versions that can a) download external libraries with your choice of inline annotation or .json config and b) produce a shared library or .jar file that DkCoder is not at all like Scala Toolkit.

pm5 · October 22, 2024, 5:30am

If it is only unix that you need, since the unix subdirectory is automatically added to the search path (I’m not sure what that means actually but there was a message that said so), you can also just do:

#!/usr/bin/env -S ocaml unix.cma -I +unix

Here is a script that does it this way.

Topic		Replies	Views
Using `ocaml` and `ocamlfind` to run scripts without #-directives Ecosystem script , ocaml , ocamlfind	1	137	March 22, 2025
What does `#use "topfind"` do and why do I need it? Learning	9	1768	December 27, 2022
Held back by inability to access libraries Learning	9	534	December 13, 2023
Compiling .ml containing toplevel directives Community build	6	713	January 6, 2023
Compile multiple ocaml files Learning	5	2679	September 29, 2020

Using OCaml for scripting with non-core libraries and editing support

Related topics