[ANN] ocamlmig, a tool to rewrite ocaml code, and complement `[@@deprecated]`

Hi,

I’m glad to announce ocamlmig, a command line tool for rewriting ocaml source code with access to scope and type information.

As the simplest example of what it’s intended for, let’s say an opam-installed library A provides this interface:

val new_name : int -> int

val old_name : int -> int
[@@migrate { repl = Rel.new_name }]

and your repository contains a file b.ml:

let _ = A.old_name 1

then you could run:

$ git diff b.ml
$ ocamlmig migrate -w
$ git diff b.ml
-let _ = A.old_name 1
+let _ = A.new_name 1

Obviously, it’s not limited to renames.

When I meant by “complement [@@deprecated]” is that instead of providing a textual description [@@deprecated "please use this thing instead" ] , you get to provide an executable description. The goal is to reduce the friction when the interface of a library evolves. If people get in the habit of running this regularly (after every opam upgrade/dune pkg lock, say), then it could also be a way to get users to switch to new interfaces without having to deprecate the old interfaces immediately.

Additionally, using that and a couple of other builtin transformations like removing opens, you can execute some refactorings, without learning anything like ppxlib or the ocaml ast, for instance:

If that piqued your interest, here is more information about what ocamlmig does, and using it.

This is decidedly work in progress, many things are not fully implemented, and it needs a lot of polish, but the existing functionality as is should still be interesting.

16 Likes

Very cool. Is this a strict superset of the [@@deprecated] attribute? Ie does it also trigger the deprecation warning?

No, the attribute doesn’t trigger any warning. You’d have to write [@@migrate ...] [@@deprecated ...] to get that behavior.

Really nice project, this looks promising !

I tried to use it on Frama-C, which is a really large project and does (maybe) weird things with dune, and I get errors on source files such as :

Uncaught exception:
  
  ("failed to read dune file for "
   (path src/kernel_internals/typing/allocates.ml)
   (Sys_error "src/kernel_internals/typing/dune: No such file or directory"))

Raised at Base__Error.raise in file "src/error.ml" (inlined), line 9, characters 21-37

It seems that it does not find the dune file which is in src folder. the project is correctly built, using dune build @install, and @check seems to work (no output so i guess it does ?).

Did I miss something in the instructions to use ocamlmig ?

here is the full backtrace :

Raised at Base__Error.raise in file "src/error.ml" (inlined), line 9, characters 21-37
Called from Base__Error.raise_s in file "src/error.ml", line 10, characters 26-47
Called from Ocamlmig__Dune_files.ppx in file "lib/dune_files.ml", line 232, characters 6-86
Called from Ocamlmig__Transform_migration.run.(fun) in file "lib/transform_migration.ml", line 2176, characters 31-65
Called from Ocamlmig__Transform_common.process_ast in file "lib/transform_common.ml", line 450, characters 4-80
Called from Ocamlmig__Fmast.update_structure in file "lib/fmast.ml", line 110, characters 13-29
Called from Ocamlmig__Transform_common.process_file' in file "lib/transform_common.ml", line 470, characters 2-53
Called from Ocamlmig.with_reported_ocaml_exn in file "lib/ocamlmig.ml", line 268, characters 6-10
Re-raised at Location.report_exception.loop in file "parsing/location.ml", line 979, characters 14-25
Called from Ocamlmig.make_report_exn.(fun) in file "lib/ocamlmig.ml", line 253, characters 12-65
Re-raised at Ocamlformat_ocaml_common__Location.report_exception.loop in file "vendor/ocamlformat/vendor/ocaml-common/location.ml", line 1009, characters 14-25
Called from Ocamlmig.make_report_exn.(fun) in file "lib/ocamlmig.ml", line 256, characters 16-70
Called from Ocamlmig.with_reported_ocaml_exn in file "lib/ocamlmig.ml", line 270, characters 4-16
Called from Ocamlmig.migrate.(fun) in file "lib/ocamlmig.ml", lines 328-332, characters 26-92
Called from Base__List0.iter in file "src/list0.ml", line 66, characters 4-7
Called from Ocamlmig.migrate.(fun) in file "lib/ocamlmig.ml", lines 296-336, characters 14-53
Called from Ocamlmig.with_ocaml_exn in file "lib/ocamlmig.ml", line 261, characters 7-19
Re-raised at Location.report_exception.loop in file "parsing/location.ml", line 979, characters 14-25
Called from Ocamlmig.make_report_exn.(fun) in file "lib/ocamlmig.ml", line 253, characters 12-65
Re-raised at Ocamlformat_ocaml_common__Location.report_exception.loop in file "vendor/ocamlformat/vendor/ocaml-common/location.ml", line 1009, characters 14-25
Called from Ocamlmig.make_report_exn.(fun) in file "lib/ocamlmig.ml", line 256, characters 16-70
Called from Ocamlmig.with_ocaml_exn in file "lib/ocamlmig.ml", line 261, characters 30-42
Called from Command.For_unix.run.(fun) in file "command/src/command.ml", lines 3388-3399, characters 8-31
Called from Base__Exn.handle_uncaught_aux in file "src/exn.ml", line 126, characters 6-10

I think this is because the tool expects a dune file in every folder, whereas Frama-C does (include_subdirs unqualified) in the top-level dune file, thus it does not have these files.

Thanks for the report! That’s a shortcoming of ocamlmig, which doesn’t handle the (include_subdirs ..) library field (although maybe it shouldn’t fail regardless, as the code could proceed without it). I fixed that, and a couple of other problems that showed up in frama-c. Now, I can do this:

$ ocamlmig mig -side-migrations ocamlmig.stdlib_to_stdlib src/libraries/project/project.ml
--- src/libraries/project/project.ml
+++ src/libraries/project/project.ml
@@ -451,10 +453,12 @@
 let magic = 9 (* magic number *)
 
 let save_projects selection projects filename =
-  let cout = open_out_bin (filename : Filepath.Normalized.t :> string) in
-  output_value cout System_config.Version.id;
-  output_value cout magic;
-  output_value cout !Graph.Blocks.cpt_vertex;
+  let cout =
+    Out_channel.open_bin (filename : Filepath.Normalized.t :> string)
+  in
+  Marshal.to_channel cout System_config.Version.id [];
+  Marshal.to_channel cout magic [];
+  Marshal.to_channel cout !Graph.Blocks.cpt_vertex [];
   let states : (t * (string * State.state_on_disk) list) list =
     Q.fold
       (fun acc p ->

You can pick up the fixes the main branch of GitHub - v-gb/ocamlmig, either with opam pin, or by building the binary from the repository directly (the binary is _build/default/bin/main.exe).

5 Likes

Hello, this is very interesting! I read the “what it does” page, but couldn’t get a sense of how it uses type info – is there someplace where that is described? It’s not, no worries, I’ll read the source. But I figured I’d ask first.

Right, the documentation is not providing this kind of information. It’s probably not easy to figure out what’s going on from a quick look at the code, but looking at call sites of Build.Type_index.* and Build.Artifacts.* should be a good starting point.

Off the top of my head, some type information is used for:

  • context-dependent rewrites, like this
  • when f has a migration in code such as f (A (g ())), depending on the replacement for f, we may end up naming the argument let something = A (g ()) in ..., which changes type inference order, which can thus break type based disambiguation, which we avoid by instead creating let something : the-type = A (g ())
  • all the code for adding/remove opens consults typing environment to figure out which identifiers are impacted

Thank you for the pointers, I’ll have a look. I remember when I got typpx working (it had been abandoned (a bit) by the author), it seemed nontrivial to do PPX-like rewriting to the typed-tree. That is, to reuse PPX rewriter code on Parsetree, but somehow take advantage of type information.

I’ll have to look to see how you did it.

1 Like

Hi,

I released a new version of ocamlmig in opam, whose main change is to avoid reformatting everything in codebases that don’t use ocamlformat. Instead, only subexpressions touched by a rewrite are reformatted.
It also requalifies identifier in more places to preserve their meaning (e.g. when replacing string_of_int by Int.to_string, there might be an Int module in scope that’s not Stdlib.Int. In such case, ocamlmig would more often replace string_of_int by Stdlib.Int.to_string).

Separately, I’ve thought about the recent addition of let+ operators in Cmdliner, and how one might migrate from the use of $ to them. Concretetely, given:

let bistro () (`Dry_run dry_run) (`Package_names pkg_names) ... = the code
open Cmdliner
let term = Term.(const bistro $ Cli.setup $ Cli.dry_run $ ...)

you’d want to have instead:

open Cmdliner
let term =
  Term.(Syntax.(
    let+ () = Cli.setup
    and+ (`Dry_run dry_run) = Cli.dry_run
    and+ (`Package_names pkg_names) = ...
    ...
    in
    the code))

ocamlmig can now transform code this way, at the tip of the ocamlmig repo (not the last release). You can see it in the second commit in this branch (and further mechanical cleanups in the commits with “…” bubbles), but to explain a bit:

let bistro () (`Dry_run dry_run) (`Package_names pkg_names) ... = the code
open Cmdliner
let term = Term.(const bistro $ Cli.setup $ Cli.dry_run $ ...)

is first turned into:

open Cmdliner
let term = Term.(const (fun () (`Dry_run dry_run) (`Package_names pkg_names) ... -> the code)
                 $ Cli.setup $ Cli.dry_run $ ...)

which is then turned into the final code:

open Cmdliner
let term =
  Term.(Syntax.(
    let+ () = Cli.setup
    and+ (`Dry_run dry_run) = Cli.dry_run
    and+ (`Package_names pkg_names) = ...
    ...
    in
    the code))

The first step is done using ocamlmig replace -w -e 'const [%move_def __f] /// const __f'. In short, what this does is anytime it sees const some-identifier, it tries to inline the definition of the identifier. In details, the left side of the /// specifies the code to search for, and the right side what to replace it with. const ... searches for literally const applied to one argument. [%move_def __f] is trickier: it matches identifiers that are let-bound somewhere in the current file, removes said let binding, and recursively matches the right hand side of the binding against __f. Variables that start with two underscores name a term for use in the replacement expression.

The second step is done with:

ocamlmig replace -w \
  -e 'const (fun __p1 __p2 __p3 -> __body) $ __e1 $ __e2 $ __e3
      /// let open Syntax in let+ __p1 = __e1 and+ __p2 = __e2 and+ __p3 = __e3 in __body'

This is longer, but given the previous explanation, it’s hopefully fairly clear what this does. The only twist is that ocamlmig generalizes this search/replace for three elements into an n-ary version (implicitly, although perhaps it should be explicit).

And that’s it. So this is the full command that I used:

ocamlmig replace -w \
  -e 'const [%move_def __f] /// const __f' \
  -e 'const (fun __p1 __p2 __p3 -> __body) $ __e1 $ __e2 $ __e3
      /// let open Syntax in let+ __p1 = __e1 and+ __p2 = __e2 and+ __p3 = __e3 in __body'

which seems pretty reasonable considering the rewrite is somewhat sophisticated.

In general, mechanizing a change can reduce the chance of accidentally modifying something, but in this specific case, ocamlmig also detects shadowing when moving code with [%move_def]. Shadowing would likely cause type errors or tests errors, but if it didn’t, it’d be quite hard to catch during code review.

Finally, if you want to try this out on your code, I’ll note that ocamlmig replace is in flux, and that while the commands above work, obvious variations of them may not.

5 Likes

That level of automated refactoring is quite impressive. It seems likely that individual cmdliner users would need to adjust based on exactly their set-up – does move_def work across module boundaries?

The names const and $ are provided by cmdliner, so I expect that the command I showed above should work as is in most cases.

By contrast, the simplification in the third commit in the linked pull request is specific to the idiom in that project where values are tagged with single constructor polymorphic variants, so this rewrite may or may not make sense depending on usage.

[%move_def] looks for a let-binding in the current file. So it works across module boundaries, i.e you could turn:

module M = struct
  let x = 1
end
let _ = M.x

into

module M = struct end
let _ = 1

But if M had a signature, or was defined in a separate compilation unit, then [%move_def] wouldn’t match them. Making these cases would certainly be harder, and I don’t know of situations that would need this kind of expressivity.

1 Like