I don’t really like to base a release announcement on bashing another project, but this whole project is motivated by my dissatisfaction with To.ml—the only TOML library for OCaml, so here we go. OTOML is a TOML library that you (hopefully) can use without writing long rants afterwards.
In short:
- TOML 1.0-compliant (To.ml is not).
- Good error reporting.
- Makes it easy to look up nested values.
- Bignum and calendar libraries are pluggable via functors.
- Flexible pretty-printer with indentation.
OPAM: opam - otoml
GitHub: GitHub - dmbaturin/otoml: A TOML(1.0.0-compliant) parsing and manipulation library for OCaml
Now let’s get to details.
TOML is supposed to be human-friendly so that people can use it as a configuration file format. For that, both developer and end-user experience must be great. To.ml provides neither. I’ve been using To.ml in my projects for a long time, and
Standard compliance
TOML is neither minimal nor obvious really, it’s much larger than the commonly used subset and the spec is not consistent and not easy to read, but To.ml fails at rather well-known things, like dotted keys, arrays of tables and heterogeneous arrays.
OTOML passes all tests in the test suite, except the tests related to bignum support. Those tests fail because the default implementation maps integers and floats to the native 31/63-bit OCaml types. More on that later.
Error reporting
Let’s look at error reporting. To.ml’s response to any parse error is a generic error with just line and column numbers.
utop # Toml.Parser.from_string "foo = [" ;;
- : Toml.Parser.result =
`Error
("Error in <string> at line 1 at column 7 (position 7)",
{Toml.Parser.source = "<string>"; line = 1; column = 7; position = 7})
Menhir offers excellent tools for error reporting, so I took time to make descriptive messages for many error conditions (there are generic “syntax error” messages still, but that’s better than nothing at all).
utop # Otoml.Parser.from_string_result "foo = [" ;;
- : (Otoml.t, string) result =
Error
"Syntax error on line 1, character 8: Malformed array (missing closing square bracket?)\n"
utop # Otoml.Parser.from_string_result "foo = {bar " ;;
- : (Otoml.t, string) result =
Error
"Syntax error on line 1, character 12: Key is followed by end of file or a malformed TOML construct.\n"
Looking up nested values
Nested sections are common in configs and should be easy to work with. This is how you do it in OTOML:
utop # let t = Otoml.Parser.from_string "[this.is.a.deeply.nested.table]
answer=42";;
val t : Otoml.t =
Otoml.TomlTable
[("this",
Otoml.TomlTable...
utop # Otoml.find t Otoml.get_integer ["this"; "is"; "a"; "deeply"; "nested"; "table"; "answer"] ;;
- : int = 42
For comparison, this is how it was done in To.ml:
utop # let toml_data = Toml.Parser.(from_string "
[this.is.a.deeply.nested.table]
answer=42" |> unsafe);;
val toml_data : Types.table = <abstr>
utop # Toml.Lenses.(get toml_data (
key "this" |-- table
|-- key "is" |-- table
|-- key "a" |-- table
|-- key "deeply" |-- table
|-- key "nested" |-- table
|-- key "table" |-- table
|-- key "answer"|-- int ));;
- : int option = Some 42
Extra dependencies
The TOML spec includes first-class RFC3339 dates, for better or worse. The irony is that most uses of TOML (and, indeed, most configuration files in the world) don’t need that, so it’s arguably a feature bloat—but if we set out to support TOML as it’s defined, that question is academic.
The practical implication is that if the standard library of a language doesn’t include a datetime type, a TOML library has to decide how to represent those values. To.ml makes ISO8601 a hard dependency, so if you don’t use dates, you end up with a useless dependency. And if you prefer another library (or need functionality no present in ISO8601), you end up with two libraries: one you chose to use, and one more forced on you.
Same goes for the arbitrary precision arithmetic. Most configs won’t need it, but the standard demands it, so something needs to be done.
Luckily, in the OCaml land we have functors, so it’s easy to make all these dependencies pluggable. So I made it a functor that takes three modules.
module Make (I : TomlInteger) (F : TomlFloat) (D : TomlDate) :
TomlImplementation with type toml_integer = I.t and type toml_float = F.t and type toml_date = D.t
This is how to use Zarith for big integers and keep the rest unchanged:
(* No signature ascription:
`module BigInteger : Otoml.Base.TomlInteger` would make the type t abstract,
which is inconvenient.
*)
module BigInteger = struct
type t = Z.t
let of_string = Z.of_string
let to_string = Z.to_string
let of_boolean b = if b then Z.one else Z.zero
let to_boolean n = (n <> Z.zero)
end
module MyToml = Otoml.Base.Make (BigInteger) (Otoml.Base.OCamlFloat) (Otoml.Base.StringDate)
Printing
To.ml’s printer can print TOML at you, that’s for certain. No indentation, nothing to help you navigate nested values.
utop # let toml_data = Toml.Parser.(from_string "[foo.bar]\nbaz=false\n [foo.quux]\n xyzzy = [1,2]" |> unsafe) |> Toml.Printer.string_of_table |> print_endline;;
[foo.bar]
baz = false
[foo.quux]
xyzzy = [1, 2]
We can do better:
utop # let t = Otoml.Parser.from_string "[foo.bar]\nbaz=false\n [foo.quux]\n xyzzy = [1,2]" |> Otoml.Printer.to_channel ~indent_width:4 ~collapse_tables:false stdout;;
[foo]
[foo.bar]
baz = false
[foo.quux]
xyzzy = [1, 2]
val t : unit = ()
utop # let t = Otoml.Parser.from_string "[foo.bar]\nbaz=false\n [foo.quux]\n xyzzy = [1,2]" |> Otoml.Printer.to_channel ~indent_width:4 ~collapse_tables:false ~indent_subtables:true stdout;;
[foo]
[foo.bar]
baz = false
[foo.quux]
xyzzy = [1, 2]
val t : unit = ()
Maintenance practices
Last but not least, good maintenance practices are also important, not just good code. To.ml is at 7.0.0 now. It has a CHANGES.md file, but I’m still to see the maintainers document what the breaking change is, who’s affected, and what they should do to make their code compatible.
For example, in 6.0.0 the breaking change was a rename from TomlLenses
to Toml.Lenses
. In an earlier release, I remember the opposite rename. Given the standard compatibility problems going unfixed for years, that’s like rearranging furniture when the roof is leaking.
I promise not to do that.
Conclusion
I hope this library will help make TOML a viable configuration file format for OCaml programs.
It’s just the first version of course, so there’s still room for improvement. For example, the lexer is especially ugly: due to TOML being highly context-sensitive, it involves massive amounts of lexer hacks for context tracking. Maybe ocamllex is a wrong tool for the job abd it should be replaced with something else (since I’m using Menhir’s incremental API anyway, it’s not tied to any lexer API).
The printer is also less tested than the parser, so there may be unhandled edge cases. It also has some cosmetic issues like newlines between parent and child tables.
Any feedback and patches are welcome!