Using CLI arguments through modules and configuration files

erhan · November 8, 2020, 2:33pm

Hello everyone.

I was looking for common approaches and best practises regarding to CLI argument passing and configuration files. I couldn’t find much after googling so here is my questions:

I am getting arguments with cmdliner library and passing those arguments through functions. And sometimes I have to pass these arguments through multiple functions. I am not sure if that’s the correct way of doing it. Let’s say I have verbose flag defined and I have logs in various modules. So how can I receive this flag whenever needed? Could you please share any resources regarding to saving arguments and accessing them from different modules?
Is there a configuration file concept in OCaml? Files such as .yaml and .json

CraigFe · November 8, 2020, 6:14pm

Hi @erhan,

For the specific case of logging, my advice is to use the Logs library or – failing that – to just copy the approach that it takes verbatim. Logs looks after some global mutable state that contains the current logging level of the program, so you don’t have to bother with propagating this information throughout your program. When I’m reading OCaml code, I’m not concerned with whether any particular function might emit log lines, so I don’t need this to be made painfully obvious at each call-site. If you do use Logs, it comes pre-packaged with Cmdliner specifications for setting the logging level in the logs.cli package (example here).

Generally, there are several options for propagating state throughout an OCaml program. In roughly decreasing order of explicitness:

pass all params explicitly to the functions that need them, precisely as you’re doing right now.
pack params into a “context” record (or object) that is passed explicitly where it’s needed. (c.f. Dune’s Context and Super_context.) It’s possible to hide this record passing in a Reader monad, which I have toyed with in the past but don’t recommend with today’s OCaml.
pack params into a “context” module that is then used to instantiate functors elsewhere in your program. (c.f. Ppxlib.Ast_builder as a way of propagating a ~loc flag everywhere.)
use global mutable state, as in Logs.

I’ve seen all four of these used sensibly in OCaml programs; the best one will depend on your particular application requirements / how much you care about tracking which parts of the program use which arguments.

Regarding your second point, AFAIK there’s no generic library for managing config files in OCaml (i.e. what Cosmiconfig provides for NPM). Every OCaml library that I’ve seen that uses one tends to roll their own logic for it. You could of course use Yojson or OCaml-Yaml to read a file in one of those formats, but you’ll end up managing the details yourself. The lightweight approach is to use environment variables, since Cmdliner will handle that boilerplate for you; managing config files is a pain, particularly w.r.t. things like respecting XDG_CONFIG and it’s analogues on Windows.

Chet_Murthy · November 8, 2020, 7:54pm

The last time I built a complex system with a bunch of components (a blockchain) I used JSON files with ppx_yojson marshalling, for all configuration. There’s the problem of “distributing” the config objects to the various subsystems, and I don’t want to minimize that, but at least, having the wireline representation be automatically derived from the types was … very valuable. It allowed me to think solely in terms of the “configuration datatype”.

Right now I’m writing rather complex PPX rewriters (an attribute grammar evaluator-generator) and again, I’ve got a “demarshaller” from OCaml ASTs to a defined datatype, generated by a PPX rewriter, so that again, I don’t think in terms of the “wireline” OCaml AST, but rather the datatype that my PPX rewriter wants to consume.

Also: you know about ppx_cmdliner, yes? The last time I used cmdliner, I used that tool, and heartily recommend it. And also, of course, I heartily, heartily recommend cmdliner, but you already know that.

Last: logging is a special case, @CraigFe is correct, I think, in suggesting that you should use a logging library that has all the config-information for all its client modules. So: that logging library would have a way to configure which client modules should be logging verbosely, and the client modules themselves would ask the logging library “what’s my logging-level? INFO? ERROR? TRACE?” or the equivalent.

xavierleroy · November 9, 2020, 10:37am

Re: configuration files, there are a few libraries in OPAM:

aryx · November 9, 2020, 1:41pm

Regarding point 1), I tend to use globals. Just define a Flag.ml file, put your globals in there (let verbose = ref true), populate the global in your cmdline library (I use Arg but I guess Cmdliner can probably do that too) and then anywhere you need to know the value of the cmdline flag, just use !Flag.verbose

This is not Haskell, you can use imperative code when it makes your life easier.

jjb · November 9, 2020, 2:14pm

If you want to use globals with cmdliner, you might find it convenient to use some code such as:

(** Extension of Cmdliner supporting lighter-weight option definition *)
module Cmdliner : sig
  include module type of Cmdliner

  val mk : default:'a -> 'a Term.t -> 'a ref
  (** [mk ~default term] is a ref which, after [parse] is called, contains
      the value of the command line option specified by [term]. *)

  val parse : Term.info -> (unit -> unit Term.ret) -> unit
  (** [parse info validate] parses the command line according to the options
      declared by calls to [mk], using manual and version [info], and
      calling [validate] to check usage constraints not expressible in the
      [Term] language. *)
end = struct
  include Cmdliner

  (** existential package of a Term and a setter for a ref to receive the
      parsed value *)
  type arg = Arg : 'a Term.t * ('a -> unit) -> arg

  (** convert a list of arg packages to a term for the tuple of all the arg
      terms, and apply it to a function that sets all the receiver refs *)
  let tuple args =
    let pair (Arg (trm_x, set_x)) (Arg (trm_y, set_y)) =
      let trm_xy = Term.(const (fun a b -> (a, b)) $ trm_x $ trm_y) in
      let set_xy (a, b) = set_x a ; set_y b in
      Arg (trm_xy, set_xy)
    in
    let init = Arg (Term.const (), fun () -> ()) in
    let (Arg (trm, set)) = List.fold_right ~f:pair args ~init in
    Term.app (Term.const set) trm

  let args : arg list ref = ref []

  let mk ~default arg =
    let var = ref default in
    let set x = var := x in
    args := Arg (arg, set) :: !args ;
    var

  let parse info validate =
    match Term.eval (Term.(ret (const validate $ tuple !args)), info) with
    | `Ok () -> ()
    | `Error _ -> Caml.exit 1
    | `Help | `Version -> Caml.exit 0
end

Topic		Replies	Views
Questions about updating code using Command library Ecosystem	4	859	August 22, 2019
Cmdlang - Yet Another CLI Library (well, not really) Ecosystem commandline , cmdliner , cli	0	256	September 7, 2024
Recommendations to create a CLI tool Learning	10	2091	May 30, 2022
Set configuration via command line arguments Learning async , commandline , learning	1	223	December 18, 2023
Is idomatic in Ocaml to take configuration and return a module that uses it? Learning module	7	1829	October 2, 2018

Using CLI arguments through modules and configuration files

Related topics