[ANN] YAMLx: pure-OCaml YAML 1.2/1.1 library

Hello,

I’m excited to announce the availability of YAMLx, a complete implementation of the YAML 1.2 and 1.1 standards. It is intended to support most uses of YAML including its use as a configuration file format in new applications and analyzing YAML files of other applications.

For a fuller description, check out the release notes.

Get started with opam install yamlx. Then use the yamlx companion command to check that it parses and interprets your favorite YAML files correctly.

Here’s an example showing the use of the library to convert YAML to JSON:

(* Read YAML from stdin with YAMLx, print JSON to stdout with Yojson

   This is a demo showing how to use the YAMLx library with the default
   settings. 'YAMLx.Values.one_of_yaml_exn' offers options to restrict
   the input language or to force the interpretation as YAML 1.1 instead of
   YAML 1.2.

   Compile with:
     ocamlfind opt -o yaml-to-json \
       -package yamlx,yojson -linkpkg yaml_to_json.ml
*)

open Printf

let rec yojson_of_yamlx (x : YAMLx.value) : Yojson.Safe.t =
  match x with
  | Null _ -> `Null
  | Bool (_, x) -> `Bool x
  | Int (_, x) -> `Intlit (Int64.to_string x)
  | Float (_, x) -> `Float x
  | String (_, x) -> `String x
  | Seq (_, xs) -> `List (List.map yojson_of_yamlx xs)
  | Map (_, xs) ->
      `Assoc (List.map (fun (loc, k, v) ->
        match (k : YAMLx.value) with
        | String (_, k) -> (k, yojson_of_yamlx v)
        | _ ->
            ksprintf failwith "%s: only string keys are supported"
              (YAMLx.default_format_loc loc)
      ) xs)

let () =
  YAMLx.register_exception_printers ();
  stdin
  |> In_channel.input_all
  |> YAMLx.Values.one_of_yaml_exn
  |> yojson_of_yamlx
  |> Yojson.Safe.pretty_to_channel stdout
  |> print_newline

Funding

YAMLx is currently released under the AGPL. There is an ongoing fundraiser: once a funding goal is reached, the license will switch to the permissive ISC license for everyone. Donors above a certain threshold receive an immediate commercial license. See my GitHub Sponsors profile for details.

13 Likes

Nice! Is there a way to get AST elements with file/line positions? In the README I don’t see any mention of file position, which is an important feature for error reporting.

1 Like

In DESIGN.md it mentions:

[…] we […] want the following features: node locations (for error messages) […]

So it seems the answer is yes? But then again I see @claude is co-author in all the significant commits, and I don’t know if I can trust that document.

1 Like

Thanks! That’s a very needed addition to OCaml libraries, sponsoring!

1 Like

Yes, locations are available as both Unicode code points and bytes, in both representations:

  • node type: the AST/CST
  • value type: the JSON-like interpretation (example shown in the original post)

I had to insist to make the coding agent provide location info. It’s something that it “forgot” in the first pass of code generation.

This is an interesting topic and there’s much to say about it. In short, I am ultimately responsible for the whole project regardless of the tools or the people who assist me. I have been programming with OCaml professionally for over 20 years and it’s important for me to maintain my reputation.

But yes, there will be mistakes and bugs like in any software project. My efforts have been focused primarily on ensuring quality from a distance. What I have been paying particular attention to includes:

  • the public interface YAMLx.mli since this is the hardest thing to change later.
  • maintainability: the intent for each node in the project (project, subproject, file, function, …) must be obvious to a reader who doesn’t have context.
  • correctness: every feature has tests (~600 tests in total including the 371 tests of the standard YAML test suite); I also did some light fuzzing which found 3 cases of uncaught generic exceptions.
  • security: avoid memory-unsafe operations, test for known YAML-related vulnerabilities (alias expansion bombs, deeply-nested documents potentially causing stack overflows).

Some of this could be improved with systematic safeguards (preferred) or checklists (“skills”).

Regarding the documentation specifically: I regularly tell the agent to review the Git commit history and update the relevant documents (changelog, API docs). It generally does a better job than I could at reading and writing English text. I find it’s particularly good at summarizing anything that is already well explained. On the flip side, it requires keeping all things clear and fully documented. I haven’t found instances of “hallucinations” on this project, possibly because I don’t ask open questions in English, I only request summaries of local material. The main issues have been out-of-date paragraphs about this or that feature which are easy to fix.

2 Likes