Application-specific Improvements to the Ecosystem

Chet_Murthy · July 27, 2022, 3:49am

I have this crazy idea of starting a long-running post, wherein people would append examples of small programs they wrote in Python or Perl, and others could respond with OCaml versions. And we could discuss whether the OCaml versions were sufficiently easy-to-write, and if not, why not.

I wonder if anybody else would be interested in that.

ETA: So for instance, here’s my first contribution:

perl -p -i -e '$_ =~ s,\r$,,g' *

What does it do?

for each file that matches *
it reads the file line-by-line
applying the code to each line
And the code? It removes the newline at the end of the line.
and printing the resulting line\
and putting that output into a new file
which is then renamed on top of the existing file

That is to say:

“-p” means read each line of input, and after running the code on each line, print out the current line
“-i” means in-place replace the file that was the input
“-e” means the next argument is the code

Implicitly, “-p” (and “-n”) mean that if there are extra arguments, they’re interpreted as filenames and one-by-one read as input.

What’s it for? To take a pile of files from Winders [spit] and remove the newlines that appear in Winders text-files, so the files are fit for UNIX.

hyphenrf · July 27, 2022, 5:49am

that’s unfair though, Perl is specifically designed for those write-once shell one-liners. The complete opposite end of the spectrum to OCaml. Trying to push OCaml to its direction is probably gonna end up making it less of what it is(?)

Chet_Murthy · July 27, 2022, 6:47am

I have written a number of somewhat complicated Perl scripts that started off as single-pagers and grew to many hundreds of lines of code. For example, I have a program that manages my library of e-books – manages a database of titles, authors, genres, and then generates a renamed hierarchy of files with standardized names for all books. That program is 2k lines of code. But it started off as a small Perl script.

Another example: there’s a standard kind of logging, entry/exit logging with machine-readable timestamps, that can be used to generate call-trees and eventually to do primitive profiling of systems running in production. A perl script that takes such a logfile, and generates profiles, isn’t too much code. But (it turns out) the developers of many production transaction-processing systems neglect to carefully match up entry/exit log-lines – so you might have the entry but not the exit, or vice versa. So you need to write code to fixup those errors, and that makes the Perl script a bit more complicated.

Here’s one: I call it “cmdg” (as in “command grep”):

#!/usr/bin/perl

while(@ARGV) {
  if ($ARGV[0] eq '-l') {
    shift @ARGV;
    $listonly = 1;
  }
  elsif ($ARGV[0] eq '-i') {
    shift @ARGV;
    $iflag = "-i";
  }
  elsif ($ARGV[0] eq '-v') {
    shift @ARGV;
    $verbose = 1 ;
  }
  else { last; }
}

unless (@ARGV) {
  print STDERR "Usage: $0 <command> <pattern> <files>\n";
}

$cmd = shift;
$pat = shift;

foreach my $f (@ARGV) {
  print "[$f]\n" if $verbose ;
  my $listed = 0;
  open(CMD,"$cmd $f | grep -i '$pat' |") || die "cannot execute <<$cmd $f>>";
  while(<CMD>) {
    if (!$listonly) {
      print "$f: $_";
    }
    else {
      print "$f\n" if (!$listed);
      $listed = 1;
    }
  }
  close(CMD);
}

What does it do? You run a command on each of a list of files, and for each one, the output of the command is grepped with a supplied pattern; lines that match are printed, prefixed by the filename. Why do you want it? Suppose you have a bunch of tarfiles, and you want to find a tarfile that contains FOO.TXT. Then you invoke

cmdg 'tar zvtf' FOO *.tgz

Is that better? I’d think that this would be germane.

Chet_Murthy · July 27, 2022, 6:53am

A different reply: sure, the loop, and the handling of input files, the overwriting, etc, are all special-purpose for Perl’s special “-p -i -e” flags. But those could be easily duplicated in OCaml. What isn’t so easy to duplicate, is the regular expression and substitution. Add in extended regular expressions, and it gets downright hard to duplicate. And yet, regular expressions for both matching and substitution, are critical to Perl’s power.

So my example is fair, in the sense that even with such a tiny regexp, OCaml would still probably fall short. Make the regexp more complicated, and the game’s up.

What I’m trying to say is: it’s the regexp, not the file-reading-writing-loop, that’s unreasonably effective here.

beajeanm · July 27, 2022, 8:01am

Don’t get me wrong, I use Dream, and I like it. And as someone who barely contributed a couple of drive-by PRs I’m thankful for all the hard work of the community.
But I would lying if I said Dream could stand next to the major web framework out there. Yes, there is probably no webapps written in Flask or whatever that couldn’t be written in Dream. But be ready to write a lot more code, so many things in other languages are just one import away while they have to be written from scratch in OCaml…

This conversation reminded that we had one: ORM, but I doubt I want to bring a camlp4o library in my build nowadays

dbuenzli · July 27, 2022, 8:19am

That’s true.

But you should also see the other side of the coin. In general, in the “first not thinking too much about errors” mindset, I found that you get better error messages by default for end user. However you are right that it makes it harder for the programmer to find the spot when time comes to improve the error message.

Moreover once that spot is found the error monad makes it much easier to properly insert the right context and reason about the erroring structure. Exceptions make that needlessly hard, not only because they disrupt your control flow but also because once you suddenly decide to handle an error, none of the surrounding code is ready for it.

Also with time you should get a good gut feeling on where to insert appropriate Result.map_error on the way.

And for people thinking that effects will be the new silver bullet should remember that effects remain… effects with all the confusing behaviour they entail. It seems it took a few years to understand that exceptions should not be used for non-exceptional behaviour, I don’t claim there won’t maybe be better ways but as a gut feeling I find it suspicious to treat non-exceptional erroring paths as effects.

Embrace errors, don’t pretend they don’t exist, this is not a contamination, it’s pollination.

MaxHaydenChiz · July 27, 2022, 8:20am

Is the point supposed to be that OCaml’s pcre bindings aren’t that great or that pcre itself isn’t that good?

I agree with your concept, but I’m unclear on the meaning of this example.

bluddy · July 27, 2022, 9:33am

So let me just add a little bit to this. A demo was made showing off a prototype of Modular Implicits a few years back, and everyone (including me) got all excited. However, the excitement was premature. Research projects are very hard to predict, and Modular Implicits to the level that @lpw25 describes here are a massive amount of research (and implementation) effort. So while it is indeed ‘a research project’, there are simpler research projects and more ambitious ones, and Modular Implicits is an extremely ambitious project and is therefore unlikely (IMO) to be implemented within any reasonable timeframe. Other projects (such as Algebraic Effects) are hopefully more modest and are plodding along. It’s important for users of the language to understand this in order to set realistic expectations.

bluddy · July 27, 2022, 11:23am

From what I understand, effects are a near one-to-one mapping of monads. Untyped effects are as bad as exceptions (or worse). Typed effects are essentially monads without the need for monadic code.

c-cube · July 27, 2022, 11:47am

@chet_murthy true, rust has ? for some errors (and panic for the rest).

A few distinctions though: ? had access to early return, so you can use it with if/loop/while/inner blocks without propagating an actual monad around.

? also has access to Into, which ocaml simply lacks, for on the spot error conversion. They’re also working on getting back traces automatically from the standard Error trait (another thing OCaml lacks).

Finally, rust has had ? from the very start (with try!). We didn’t. So I think that while result is useful, it’s not a clear winner in all situations. I’m still not totally satisfied of my way of doing things but it’s getting there too, with let@ to add context to a custom error type that can accumulate messages and locations, RAII style. That’s pretty nice if you deal mostly with unrecoverable errors (proof systems, compilers) where you mostly want good messages and good context for the user.

jumpnbrownweasel · July 27, 2022, 3:00pm

A different problem I noticed from my use of Rust is that typing of errors becomes less useful the farther up the stack they’re handled, since the different error types from each level must be unioned together. The common scenario is that library functions returns specific error types but at the application level the Result<(), Box<dyn Error>> is often used as a catch-all, like Exception or Throwable in Java.

Error handling is still an unsolved problems as far as I can tell! The Scala people are also struggling with it:
https://dotty.epfl.ch/docs/reference/experimental/canthrow.html

The ability to have typed effects in OCaml will allow more flexibility in this regard, especially if effects can be easily coerced to the Result type for people that prefer it. What would be attractive to new users is some sort of error handling story that is explained clearly up front, for example:

OCaml by default uses typed effects to indicate errors. This ensures that all errors are handled explicitly and are visible in each function signature that raises errors, without requiring special syntax to propagate errors.
When monadic error handling is preferred, error effects may be easily coerced to a monadic Result type. The let* syntax provides propagation of errors (and other monadic structures) in the same way as do syntax in Haskell.

MaxHaydenChiz · July 27, 2022, 3:00pm

That GitHub link seems to be the thing I was thinking of.

MaxHaydenChiz · July 27, 2022, 3:10pm

Something I haven’t put much thought into is whether you could use OCaml’s object oriented system to achieve something similar to what Python does with numpy and the like. My concern would be that, currently, using objects blocks optimization and that there may be some weird customized dispatch stuff going on that would effectively require adding multiple dispatch to OCaml in order for the typing to remain sound without similarly obnoxious amounts of boiler plate.

But numeric code isn’t really a high priority for me. As said above, if I had to pick one new feature, it would be seamless Python FFI support in both directions. That greatly alleviates the chicken and egg problem of “no library for X” and also makes it easier for people to use OCaml in situations where C or C++ might normally be used to make the core of a Python library.

(Still, I do hope that modular implicits pans out. The capability would add a lot of value. And, to some extent, I think that code that is best expressed with typeclasses and the like is just better off not being written in OCaml right now.)

Armael · July 27, 2022, 3:45pm

re: python FFI, I seem to remember that at least @thierry-martinez has done quite a bit of work on this. A couple links I could dig up from my browser history (there might be more, I haven’t used the stuff myself): GitHub - thierry-martinez/pyml: OCaml bindings for Python ,
GitHub - thierry-martinez/ocaml-in-python: Effortless Python bindings for OCaml modules (based on pyml),
GitHub - mooreryan/ocaml_python_bindgen: Generate Python bindings via pyml from OCaml value specifications (also based on pyml).

MaxHaydenChiz · July 27, 2022, 3:56pm

Thanks for that. For the next relevant project that comes up, I’ll check it out and see how seemless bindgen has made things and how reliable it is compared to what’s available in other ecosystems.

Armael · July 27, 2022, 4:14pm

Sounds good. I haven’t had yet the need for python interop myself, but I’d be interested to read an experience report if you do try.

MaxHaydenChiz · July 27, 2022, 5:51pm

I’ll write up that report when I do get a project that’s a good fit.

Browsing through the docs, I wish it was a bit more comprehensive and automated since you still have to supply types for the val specs (contrast with Julia’s PyCall), but we gotta start somewhere and I’ll see how much friction there is in practice and how much difference there is between the two once I give it a try with something substantial.

Chet_Murthy · July 28, 2022, 7:02am

We think about “regexp support” as just “bindings”. Or maybe “bindings”+“match construct”. But look at the extensive support in Perl for regular expressions – all the various flags that are available and what they do. And the concise yet transparent way in which these things are invoked. OCaml has nothing to compare. In that sense, yes, the pcre bindings aren’t that great.

But I don’t to slag on them as bindings: they’re perfectly fine as bindings. What’s missing is language support to make using those bindings as effortless as they are in Perl.

And something else that I must note: that language support is going to necessarily mean stepping outside the current syntax of OCaml. I mean, it’s simply not going to be possible (I think) to provide the same level of effortlessness as you get in Perl, within the current OCaml syntax (hence, via hacking that syntax with PPX). I could be wrong about this. But for instance, sedlex’s PPX syntax incarnation is decidedly less effortless than ocamllex.

n4323 · July 28, 2022, 9:37am

I agree. I use OCaml in a scientific context for simulations – but then I need to export results to CSV or some such and read that from python to do proper visualization, as that is not available in OCaml (No, the plotting facilities currently in Owl are not proper visualization. I need functionality on par with matplotlib). This workflow feels clunky compared to a Jupyter notebook, and really is somewhat painful.

Now if I could make a wrapper to drive my OCaml simulation from python with almost no effort and automatic conversion of arrays/bigarrays to numpy arrays, that pain would go away. I’ve been wanting to try pyml and friends but it did still appear to require some non-negligible wrapping work.

In an ideal world, most of the numerical / data science ecosystem around python would have bindings for OCaml, but that’s not going to happen – the next best thing really is painless python interoperability.

I disagree with the notion put forward elsewhere that static typing makes OCaml unsuitable for exploratory numerical/data science work. Once a scientific program evolves over more than a week or so, being forced to structure the code sensibly by the type system already pays off IME.

jrfondren · July 28, 2022, 10:13am

it’s … unfair? It is what it is. Whose feelings are we trying to spare by caring about fairness? Perl didn’t stumble into being good for oneliners, it saw them in awk and imitated the required features. Ruby did the same, and has the same flags as Perl. The D programming language was trying to be a better C++, and not specifically designed for one-liners, but it comes with ‘rdmd’, a little wrapper for the D compiler that can be used as such:

$ rdmd --loop='line.replaceFirst("\r$", "").writeln'
hi^M
hi
there^M
there

where the lines ctrl-M endings are my input. This isn’t exactly the same usage as Perl (which is also using the <ARGV> feature of defaulting to stdin input, but given arguments taking each of them as filenames to use for input) but it’s convenient enough that you might write it instead of Perl.

With the same caveat:

#! /usr/bin/env ocamlscript
Ocaml.packs := ["str"]
--
let rec pie file f =
  match In_channel.input_line file with
  | Some line ->
    print_endline (f line);
    pie file f
  | None -> ()

let () =
  let dosline = Str.regexp "\r$" in
  pie stdin (Str.replace_first dosline "")

This is slightly annoying as ocamlscript requires camlp4 requires ocaml <= 4.14.0, and my In_channels use requires 4.14.0, so you practically need a whole opam switch just for this one script. Buliding with dune saves you from the camlp4 requirement but now you have a bunch of files instead of Perl’s zero files for the task. What I’d suggest is either extending or competing with ocamlscript and adding flags for some usage like

rocaml --let=dosline='Str.regexp "\r$" --loop='Str.replace_first dosline ""'

No design changes needed to the language. Even without such a tool, sysadmins don’t sleepwalk into creating git repos and starting to make more polished, safer, more maintainable scripts. At least the process of creating the git repo usually wakes them up. So they’re no longer as interested in oneliners but are still interested in the functionality, and here even the full dune implementation of dos2unix might be compelling enough: GitHub - jrfondren/dos2unix-ocaml: An example dos2unix in OCaml

Topic		Replies	Views
Owl project restructured Ecosystem announce	10	1812	July 21, 2024
Taking Inventory of the OCaml Ecosystem on OCaml.org Ecosystem user-feedback , ocamlorg	14	1056	June 2, 2023
[ANN] Draft of OCaml Scientific Computing book Ecosystem book , owl	17	13958	August 31, 2020
Ecosystem Wishlist Survey Community user-feedback	0	357	December 6, 2023
OCaml for Data Science Ecosystem machine-learning , data-science , statistics	25	12755	May 3, 2018

Application-specific Improvements to the Ecosystem

Related topics