RFC: multifile ATD definition support via import statements

Hello,

We’re adding support for splitting ATD type definitions into smaller files. It wasn’t possible to conveniently reference types defined from other ATD files until now, resulting in large interface files (example).

I would like a pair of eyes to help me check the sanity of the spec which is copy-pasted below from the pull request before this goes live in atd 4.0.0.


Import declarations

An ATD file may import other ATD modules using import declarations.
Import declarations must appear after any top-level annotations and before
any type definitions.

Syntax:

import module.path as alias

The as clause is optional. Without it, the local name of the imported
module is the last component of the dotted path (e.g. import foo.bar
binds the local name bar).

Type names from an imported module are referenced using dot notation:
alias.typename (or lastcomponent.typename when no alias is given).
For example, if a module types is imported, the type date from
that module is written types.date in type expressions.

Annotations on the path or alias allow language-specific backends to
override the module name used in generated code. The annotation
<ocaml name="..."> (or the equivalent for another target language)
on the path controls how the module is referenced in generated output,
while the same annotation on the as clause controls the local alias
name used in the generated code.

Examples:

(* Simple import: local name is "common" *)
import mylib.common

(* Import with an alias *)
import mylib.common as c

(* Using an imported type in a definition *)
type event = {
  id : string;
  timestamp : common.date;
}
(* Language-specific name annotation on the path *)
import mylib.common <ocaml name="Mylib_common">

(* Language-specific name annotation on the alias *)
import mylib.common as c <ocaml name="Common">

Warning:

Dotted module paths (e.g. import foo.bar.baz) are an experimental
feature. Each code generator maps them to file paths in its own way and
there is currently no guarantee of consistent behavior across backends.
When possible, prefer single-component module names (e.g. import baz
or import foo as bar). Support for dotted module paths may be removed
in a future release.

Nice. Is it intended to replace things like this:

type t_error <ocaml from="Utils"> = abstract

Yes. Now, you’d have a Utils.atd file that you’d use as:

import utils
type errors = utils.t_error list

And if Utils is an OCaml module but there’s no ATD file Utils.atd or utils.atd, that’s fine too as long as it exposes a compatible interface.

The OCaml translation (mli) is

type errors = Utils.t_error list

val errors_of_yojson : Yojson.Safe.t -> errors
val yojson_of_errors : errors -> Yojson.Safe.t
val errors_of_json : string -> errors
val json_of_errors : errors -> string

module Errors : sig
  type nonrec t = errors
  val of_yojson : Yojson.Safe.t -> t
  val to_yojson : t -> Yojson.Safe.t
  val of_json : string -> t
  val to_json : t -> string
end

looks nice

questions:

  • what’s the error message if you do import foo.bar.t where t is a type definition inside of module foo.bar? I can see myself trying to import a single definition (which I think is not supported, based on your description of the feature)…
  • would it make sense to be able to import a type definition?

tl;dr: scroll down to “Final proposal”

There’s no error message from adtml, atdpy, etc. because they don’t consult or require an ATD file that defines foo.bar.t. The error messages depend on the translation to the target language. OCaml is simple because module names are capitalized so there’s no confusion between modules and types. Python doesn’t care as long the object quacks like duck. Not sure about other languages.

It’s not supported but maybe it’s a mistake.

Right now, the value of import statements is that they sit at the beginning of the file and they make it clear that we depend on an external module.

What is the benefit of importing just one type?

It clarifies the dependencies in the generated code. It seems like a good thing to support.

Can we reuse Python syntax and semantics?

Python:

import a.b
import a.c

from x.y import z

OCaml equivalent assuming Python semantics:

module A = struct
  module B = A.B
  module C = A.C
end

module Z = X.Y

or

module A = struct
  type b = A.b
  type c = A.c
end

type z = X.Y.z

or

module A = struct
  let b = A.b
  let c = A.c
end

let z = X.Y.z

Other languages: ?

The Python syntax can be used to import different kinds of objects that require different syntax in OCaml and probably many other languages. So if we want to import types, our ATD syntax should clarify that what’s a type and what’s a module.

Should we even allow the import of whole modules?

Maybe not. Importing only what we need is considered best practice in Python, TypeScript, etc. We might as well enforce it.

Here’s a suggested syntax for ATD:

from a.b import c, d
from x.b import c as c_, e as ee

where a.b and x.b are modules while and c, c_, d, e, and ee are types.

The import ... syntax would not be supported.

We find the same syntax in Python. In ATD, we assume the names on the right-hand side of import are type names.

ATD type expressions would not contain dotted identifiers.

An alternate syntax proposal for importing specific types

In this proposal, module names are capitalized like in OCaml but not type names. This allows
the shorthand notation import Module.type_ where it’s clear that we’re importing a type and not a whole module.

from A.B import c, d
import X.B.e

The distinction between module names and other names based on capitalization is familiar to OCaml users but not to others. Besides, the syntax import X.B.e is redundant unless we also drop the from-import syntax… but from-import is preferred because it avoids repeating the module path for each imported item.

Would this new from-import syntax map well to most languages?

Probably. from a import b doesn’t require an actual import statement in the target language. It’s just some local renaming.

Final proposal

  • remove the import a.b syntax for importing a whole module
  • add support for from a import c, d for importing types
  • type name aliasing with as is supported because it’s needed to avoid name conflicts e.g. from a import b as b_
  • module names are still lowercase identifiers (same as type names)
  • support for dotted module paths remains experimental (from a.b import c)

So far we have two proposals. I’m adding a third one and I would like to implement just one.

import (current implementation)

import a

type t = a.b list

This has the advantage of clarifying that a.b is defined externally.

from-import (proposed instead of the current implementation)

from a import b

type t = b list

hybrid

This third proposal combines properties of the proposals above:

  • clarify which types are being imported at the beginning of the ATD file
  • preserve dot notation to indicate external origin
from a import b

type t = a.b list

This syntax supports aliasing on the module path (as in the original import implementation) but not on the type names. A more complete example is

from x.a import b, c
from y.a as a2 import b, c

type t1 = a.b list
type t2 = a2.b list

When the user wants a shorter name without a dot, they will define a type alias:

from a import b

type b = a.b
type t = b list

I detailed the revised design for the import feature (“hybrid” proposal above) at Revise syntax to import specific types · Issue #456 · ahrefs/atd · GitHub