Is there ever a time when it makes sense to choose Go over OCaml?

Well, “dynamically-typed”, sure. Not untyped. And in the code I shared, every variable has one and only one type. I never write Perl in any other manner.

The reason I shared that code, was so that somebody could take a crack at writing it in OCaml. I have a couple of other scripts (like this one: pa_ppx/LAUNCH at master · camlp5/pa_ppx · GitHub ) that use a lot. The fact that it’s a Perl script makes portability (to MacOS homebrew, feh) a pain, and some people suggested I could rewrite this script in OCaml.

So if the response is “well, OCaml will never be useful to write scripts like this, as quickly as you wrote this one” then … that’s the same as saying that OCaml isn’t interested in a certain class of programming.

Many complex programs start off as small, simple tools, growing by accretion and extension into behemoths. Languages that are too cumbersome to be useful at the beginning, face a barrier to adoption.

I once wrote an aspect-oriented code-injection tool as a (somewhat complex) Perl program. It started off pretty simple.

1 Like

I thought for a while, and I think you’re missing the real point: what makes Perl powerful, what makes it such a magic UNIX-programming engine, is that it has high-quality support for text. A lot of UNIX is text, and Perl is excellent at dealing with text – both eating it, and producing it. Every language has things it’s good at and things it’s not so good at. ML-family languages are good at constructor data-types, for example. And Perl is good at text.

There’s a class of programs that are all about text. OCaml is terrible for this class of programs. That’s really the problem.

@Chet_Murthy asked if I could write a quick translation of his Perl script into OCaml for comparison purposes. It is more or less meant to be a direct translation of Chet’s Perl into idiomatic OCaml, though I moved the “argument parsing” into its own function because I wanted to do it with recursion, and I used let () = for the main sequence of computations because I’m not a barbarian.

I haven’t tested it because I don’t have a usecase, but it compiles so it’s probably close if not correct.

I did pull in the Re library for this. I guess it is also possible with Str in the standard library, but everyone says not to use that.

let rec split_args cmd = function
  | "--" :: files -> List.rev cmd, files
  | [file] -> List.rev cmd, [file]
  | arg :: args -> split_args (arg :: cmd) args
  | [] -> failwith "please supply input arguments"
let split_args = split_args []

let comment_pattern = Re.Perl.compile_pat "^\\(\\*\\*(.*?)\\*\\)"

let discover_args f =
  let f' = open_in f in
  let line1 = input_line f' in
  close_in f';
  match Re.exec_opt comment_pattern line1 with
  | None -> ""
  | Some group -> Re.Group.get group 1

let () = 
  let cmd, files =
    Array.to_list Sys.argv |> List.tl |> split_args in
  let cmd = Filename.quote_command (List.hd cmd) (List.tl cmd) in

  List.iter (fun f ->
      let extra = discover_args f in
      let cmd = Printf.sprintf "%s %s %s" cmd extra f in
      Printf.fprintf stderr "%s\n" cmd;
      ignore (Sys.command cmd))
    files

I think that OCaml acquits itself fairly well in this short comparison. The regex handling is not as nice—doubling up on all those backslashes—but the argument parsing is arguably a bit cleaner thanks to pattern matching. The rest is more or less similar.

1 Like

You can use a raw string:

let comment_pattern = Re.Perl.compile_pat {|^\(\*\*(.*?)\*\)|}
6 Likes

You could use a {| string literal\like this\one |} to decrease the
amount of escaping required, I think.

2 Likes

I’m kind of bemused that you think OCaml is terrible for dealing with text, given that its raison d’etre is writing compilers—by definition a task that involves eating and producing text (or a byte stream, depending on the compiler target).

Using regular expressions in Perl is a bit prettier than in OCaml, but I think pattern matching is a great feature for a lot of text-oriented jobs.

Thanks! I’ll remember that for next time.

I don’t buy the argument (if that is your argument) that OCaml has to be more perl-like to become less niche. Perl and OCaml exist for the most part in different namespaces. Perl is a dynamically typed scripting language which is great at text manipulation and for ending up with somewhat inscrutable code. OCaml is a statically, strictly typed language which (although it has an interpreter which can be used for scripting) is predominantly a compiled language.

Ocaml has I think succumbed occasionally to unnecessary inscrutability (personally I still can’t see the point of introducing let-punning for the let binding operators, and I struggle with some of the more obscure uses of GADTs) but by and large it holds the line and seems to me to be heading in the right direction.

1 Like

Aaron, nicely done! I’ll take the next step, rewrite another script in OCaml, and post it back here.
BTW, I was unaware of Filename.quote. Nice! The lack of that was keeping me from rewriting another script.

1 Like

Yes. I only discovered Filename.quote/quote_command from reading the documentation to Sys.command. It’s a bit awkwardly placed, in a module called “Filename”—an essential module if you want to do stuff with the filesystem, though.

Xavier Leroy and Didier Rémy actually wrote a book about working with Unix systems in OCaml. Unix system programming in OCaml. My impression is that the OCaml standard library was written with this usecase in mind, though perhaps the language itself is not—at least not in the same way Perl is.

  1. with ppx_regexp:
match%pcre  (* or function%pcre *)
| {|re1|} -> e1
...
| {|reN|} -> eN
| _ -> e0

There is ppx_tyre which is similar.

  1. String interpolation… with ppx_string:
let script_remotely (user : string) (host : string) (port : int) (script : string) =
  [%string "ssh %{user}@%{host} -p %{port#Int} %{Sys.quote script}"]
3 Likes

I’ve used the Unix module a ton. And for writing systems code (e.g. distributed systems, low-level database, etc) it’s great. But for UNIX scripting, I’ve always found OCaml’s lack of nice facilities for dealing with text, and with the filesystem hierarchy (filenames, directories, and computations over them) to be a big problem. bos helps a lot with the latter, and I’ve found that to be helpful.

Purely for amusement, here’s another Perl script. It’s pretty much the first one I ever “wrote”. That is to say, I copied it from Randall Schwartz’ “Programming Perl” book back in … 1995, put it in my bin directory, and have never modified it since. It’s called fixin, and you call it thus:

$ fixin foo/bar fuzz.pl

where foo/bar and fuzz.pl are filenames. These are files that contain scripts, and we assume each starts with a “shebang” (#!blabla) line. And the script’s job is to rewrite that line so that the interpreter specified on the line exists in the current environment. So:

  1. suppose your bash resides in /usr/bin/bash
  2. suppose you write a script foo.sh with the “shebang” line of #!/usr/local/bin/bash

Why is it /usr/local/bin/bash ? B/c that’s where bash lived on the machine where you wrote the script. But now you’re using it on a different machine, where it lives in the standard location (grin).

  1. then when you run foo.sh, you’ll get an error
$ ./foo.sh
bash: ./foo.sh: /usr/local/bin/bash: bad interpreter: No such file or directory
  1. But then run fixin on it:
$ fixin foo.sh
Ignoring /bin/bash
Changing foo.sh to /usr/bin/bash

and now
5. run foo.sh again successfully

$ ./foo.sh
foo

Now, obviously this was written originally to do the job for your Perl scripts, and back then there was no guarantee Perl was installed on your machine (certainly not on AIX/Solaris machines). And this was before the idiom of #!/usr/bin/env perl became common. Anyway, here’s the script: you’ll notice that it’s … in a somewhat archaic idiom of Perl – if I were to rewrite it, I’d do it in a much more modern way. For instance, it doesn’t scope its variables – Perl5 was released in late 1994, so most Perl scripts didn’t my (for lexical scoping).

#!/usr/bin/perl

# Usage: fixin [-s] [files]

# Configuration constants.

$does_shbang = 1;       # Does kernel recognize #! hack?
$verbose = 1;           # Default to verbose

# Construct list of directories to search.

@absdirs = reverse grep(m!^/!, split(/:/, $ENV{'PATH'}, 999));

# Process command line arguments.

if ($ARGV[0] eq '-s') {
    shift;
    $verbose = 0;
}

die "Usage: fixin [-s] [files]\n" unless @ARGV || !-t;

@ARGV = '-' unless @ARGV;

# Now do each file.

FILE: foreach $filename (@ARGV) {
    open(IN, $filename) ||
	((warn "Can't process $filename: $!\n"), next);
    $_ = <IN>;
    next FILE unless /^#!/;     # Not a shbang file.

    # Now figure out the interpreter name.

    chop($cmd = $_);
    $cmd =~ s/^#! *//;
    ($cmd,$arg) = split(' ', $cmd, 2);
    $cmd =~ s!^.*/!!;

    # Now look (in reverse) for interpreter in absolute PATH.

    $found = '';
    foreach $dir (@absdirs) {
	if (-x "$dir/$cmd") {
	    warn "Ignoring $found\n" if $verbose && $found;
	    $found = "$dir/$cmd";
	}
    }

    # Figure out how to invoke interpreter on this machine.

    if ($found) {
	warn "Changing $filename to $found\n" if $verbose;
	if ($does_shbang) {
	    $_ = "#!$found";
	    $_ .= ' ' . $arg if $arg ne '';
	    $_ .= "\n";
	}
	else {
	    $_ = <<EOF;
:
eval 'exec $found $arg -S \$0 \${1+"\$@"}'
    if \$running_under_some_shell;
EOF
	}
    }
    else {
	warn "Can't find $cmd in PATH, $filename unchanged\n"
	    if $verbose;
	next FILE;
    }

    # Make new file if necessary.

    if ($filename eq '-') {
	select(STDOUT);
    }
    else {
	rename($filename, "$filename.bak")
	    || ((warn "Can't modify $filename"), next FILE);
	open(OUT,">$filename")
	    || die "Can't create new $filename: $!\n";
	($dev,$ino,$mode) = stat IN;
	$mode = 0755 unless $dev;
	chmod $mode, $filename;
	select(OUT);
    }

    # Print out the new #! line (or equivalent).

    print;

    # Copy the rest of the file.

    while (<IN>) {
	print;
    }
    close IN;
    close OUT;
}
1 Like

Ironically, the Str module documentation mentions quoted string literals which can be used to avoid backslash hell :wink:

2 Likes

You just used the wrong string literal syntax there.

Alright guys, I already got the message about the raw strings after the first person mentioned it. I’m all good to go now. :+1:

4 Likes

Oh, nice. I’m going to try these out. Do you have pointers to other PPX rewriters that are useful when dealing with strings ?
ETA: and looking at this, I hunger for modular implicits to implicitly perform the conversions to string from various types. Ugh, I so hunger for them.

1 Like

I guess GitHub - ocaml-ppx/ppx_deriving: Type-driven code generation for OCaml and especially [@@deriving show] could help you. (There are more on the same page. This ppx is partially covered by Real World Ocaml, but with other goodies (compare, sexp…)

grin I knew about ppx_deriving and the many derivers – I’ve implemented Campl5-based versions of many of them ( GitHub - camlp5/pa_ppx: Implementation of PPX rewriters using camlp5 infrastructure ), as well as new ones in the same vein ( GitHub - camlp5/pa_ppx_migrate: PPX Rewriter to help write AST migrations for Ocaml (using Camlp5 and pa_ppx) , GitHub - camlp5/pa_ppx_hashcons: PPX rewriter to mechanize hash-consing (after the method of Filliatre and Conchon) , others ) and a bunch of other PPX rewriters. Really, it was to see what people were doing with regexps and PPX.

And based on seeing ppx_regexp and tyre, I’ve started hacking away on something to do something similar, but based on Perl’s syntax. So thank you for pointing me at those!

Following along from @ninjaaron 's ports of a couple of my Perl scripts to OCaml, and then @Frederic_Loyer 's pointing me at ppx_regexp, I decided to write some more “perl-ish” regexp PPX rewriters, and use 'em to port some Perl scripts. So: GitHub - camlp5/pa_ppx_perl: Camlp5-compatible PPX rewriters for Perl idioms .

I’ve implemented a few PPX extensions: match, split, pattern, subst, and tried to come up with a syntax that’s as much like the Perl regexp integration as possible. I’d be interested in any comments or suggestions.

As examples, I used 'em to rewrite ya_wrap_ocamlfind.ml (@ninjaaron 's OCaml version) a little – replacing the regexp bits:

(** -syntax camlp5o *)
let rec split_args cmd = function
  | "--" :: files -> List.rev cmd, files
  | [file] -> List.rev cmd, [file]
  | arg :: args -> split_args (arg :: cmd) args
  | [] -> failwith "please supply input arguments"
let split_args = split_args []

let envsubst s =
  let envlookup vname =
    match Sys.getenv_opt vname with
      Some v -> v
    | None -> failwith [%pattern {|ya_wrap_ocamlfind: environment variable <<${vname}>> not found|}] in
  let f s1 s2 =
    if s1 <> "" then envlookup s1
    else if s2 <> "" then envlookup s2
    else assert false in

  [%subst {|(?:\$\(([^)]+)\)|\$\{([^}]+)\})|} / {| f $1$ $2$ |} / g e] s

let discover_args f =
  let f' = open_in f in
  let line1 = input_line f' in
  close_in f';
  match [%match {|^\(\*\*(.*?)\*\)|} / strings] line1 with
  | None -> ""
  | Some (_, Some params) -> envsubst params

let () = 
  let cmd, files =
    Array.to_list Sys.argv |> List.tl |> split_args in
  let cmd = Filename.quote_command (List.hd cmd) (List.tl cmd) in

  List.iter (fun f ->
      let extra = discover_args f in
      let cmd = [%pattern {|${cmd} ${extra} ${f}|}] in
      Printf.fprintf stderr "%s\n%!" cmd;
      ignore (Sys.command cmd))
    files

and then I did the same to another script, META.pl that I use to bash on auto-generated META files. The idea is, when doing a “local build” of a project, I install the files into a local-install directory. If the project (say, pa_ppx_perl) has two subdirectories (say, pa_perl and runtime) that produces two findlib packages (say, pa_ppx_perl and pa_ppx_perl_runtime). So I generate META files for these (with proper version numbers). Then when I want to install in a system-wide location, I want the package-names to be pa_ppx_perl and pa_ppx_perl.runtime (notice “_” turns into “.”).

META.pl accomplishes that little bit of hacking. For a small project with two subdirs, it’s maybe overkill, but for pa_ppx (with 17 subdirs, it starts to be useful). So below I attach the Perl program, and then the OCaml program (nearly finished).
Perl:

#!/usr/bin/env perl

use strict ;
use IPC::System::Simple qw(systemx runx capturex $EXITVAL);
use String::ShellQuote ;
use File::Basename;

use Version ;

our %pkgmap = (
  'pa_ppx_perl_runtime' => 'pa_ppx_perl.runtime',
  'pa_ppx_perl' => 'pa_ppx_perl',
    );

{
  my $perlmeta = indent(2, fixdeps(capturex("./pa_perl/META.pl"))) ;
  my $rtmeta = indent(2, fixdeps(capturex("./runtime/META.pl"))) ;

  print <<"EOF";
$perlmeta

package "runtime" (
$rtmeta
)

EOF
}

sub fixdeps {
  my $txt = join('', @_) ;
  $txt =~ s,^(.*require.*)$, fix0($1) ,mge;
  return $txt ; 
}

sub fix0 {
  my $txt = shift ;
  $txt =~ s,"([^"]+)", '"'. fix($1) .'"' ,e;
  return $txt ;
}

sub fix {
  my $txt = shift ;
  my @l = split(/,/,$txt);
  my @ol = () ;
  foreach my $p (@l) {
    $p =~ s,^([^.]+), $pkgmap{$1} || $1 ,e ;
    push(@ol, $p);
  }
  $txt = join(',', @ol) ;
  return $txt ;
}

sub indent {
  my $n = shift ;
  my $txt = shift ;
  my $pfx = ' ' x $n ;
  $txt =~ s,^,$pfx,gm;
  return $txt ;
}

OCaml:

(** -syntax camlp5o *)

open Pa_ppx_utils

let pkgmap = [
      "pa_ppx_perl_runtime","pa_ppx_perl.runtime"
    ; "pa_ppx_perl","pa_ppx_perl"
  ]

let indent n txt =
  let pfx = String.make n ' ' in
  [%subst {|^|} / {|${pfx}|} / g m] txt

let fix txt =
  let l = [%split {|\s*,\s*|}] txt in
  let f s =
    match List.assoc s pkgmap with
      exception Not_found -> s
    | v -> v in
  let ol =
    l
    |> List.map (fun p ->
           [%subst {|^([^.]+)|} / {| f $1$ |} / e] p
         ) in
  String.concat "," ol

let fix0 txt =
  [%subst {|"([^"]+)"|} / {| "\"" ^ fix($1$) ^ "\"" |} / e] txt


let fixdeps txt =
  [%subst {|^(.*require.*)$|} / {| fix0($1$) |} / m g e] txt

let capturex (cmd, args) =
  let channel = Unix.open_process_args_in cmd args in
  let txt = Std.read_ic_fully ~channel () in
  close_in channel ;
  txt

let perlmeta = indent 2  (fixdeps(capturex("./pa_perl/META.pl",[||]))) ;;
let rtmeta = indent 2 (fixdeps(capturex("./runtime/META.pl",[||]))) ;;
print_string [%pattern {|${perlmeta}

package "runtime" (
${rtmeta}
)
|}]
1 Like