How to expose date/time types in a library nicely?

I recently made a library to read XLSX files, and one aspect of that is exposing columns with date/time info as Core.Date and Core.Time. I almost immediately got a request from someone to not have a huge dependency like Core (or even Core_kernel) to expose two types.

Now I’m wondering, what is the best way to expose a date or time type in a library (given that the library doesn’t depend on Core for any other reason)? It seems like there are at least four different options:

  1. Core.Date
  2. Calendar
  3. Ptime
  4. Make my own custom date and time types (or just expose them as int64)
  5. Functorize everything over a Date module

I’m not actually willing to do (5) since that sounds terrible. (4) would be easy and sort-of make everyone happy, although it would also make the interface worse and just make it so everyone has to do a type conversion to their preferred date/time formats. (1) is tempting but the more I think about it, the more I’m concerned that Core’s extremely-specific version requirements are something I don’t want to inflict on people who don’t even want to use it.

I’m not really sure how to pick between Calendar and Ptime. Calendar seems more suited for my use-case (exposing dates and times as reasonable types that can be manipulated), although Ptime seems more popular and has the right types (although it doesn’t have calendar functions).

Is this another Async/Lwt thing or is there a “standard-ish way to represent dates and times”?

3 Likes

There is also https://github.com/hhugo/odate

1 Like

If you just need a representation of POSIX date and time, I think ptime is a good option (I’m not familiar with Core.Date). If you want calendar computation, then in addition to calendar, there is netdate from ocamlnet. I haven’t used it, but it looks well designed, in case you don’t mind the bundling.

How about Unix.tm, that seems pretty standard?

I like this idea. Pick a good representation but leave it to the client of the library to convert this to something else for calendrical calculations because this is not the focus of the library. I would consider using a Unix-style timestamp: time since 00:00:00 GMT, Jan. 1, 1970, in seconds as an int or float.

3 Likes

I think you should follow @lindig’s advice. Relying on the Unix module is not a good idea, there are places where it’s not available.

float is kind of standard in the OCaml world because that’s what Unix.gettimeofday returns.

Out of curiosity what representation is used in XLSX files ?

2 Likes

If XLSX uses an explicit year, month, day representation and you want to represent this as seconds since Jan 1, 1970, you are facing the problem to convert to this format. Here is some code to deal with this. However, it does not respect subtleties like leap seconds and time zones.

type date =
  { year:     int
  ; month:    int
  ; day:      int
  ; hour:     int
  ; min:      int
  ; sec:      int
  }
(** [is_leapyear] is true, if and only if a year is a leap year *)
let is_leapyear year =
        year mod 4    = 0
    &&  year mod 400 != 100
    &&  year mod 400 != 200
    &&  year mod 400 != 300

let ( ** ) x y    = Int64.mul (Int64.of_int x) y
let sec           = 1L
let sec_per_min   = 60 ** sec
let sec_per_hour  = 60 ** sec_per_min
let sec_per_day   = 24 ** sec_per_hour

(* The following calculations are based on the following book: Nachum
Dershowitz, Edward M. Reingold: Calendrical calculations (3. ed.).
Cambridge University Press 2008, ISBN 978-0-521-88540-9, pp. I-XXIX,
1-479, Chapter 2, The Gregorian Calendar *)

let days_since_epoch yy mm dd =
  let epoch       = 1       in
  let y'          = yy - 1  in
  let correction  =
    if mm <= 2                          then 0
    else if mm > 2 && is_leapyear yy    then -1
                                        else -2
  in
    epoch - 1 + 365*y' + y'/4 - y'/100 + y'/400 +
    (367 * mm - 362)/12 + correction + dd

let seconds_since_epoch d =
  let ( ++ )        = Int64.add in
    (days_since_epoch d.year d.month d.day ** sec_per_day)
    ++ (d.hour ** sec_per_hour)
    ++ (d.min  ** sec_per_min)
    ++ (d.sec  ** sec)

1 Like

This is how, using ptime, I’ve exposed dates in Logarion:

http://cgit.orbitalfox.eu/logarion/tree/src/core/meta.ml#n1

1 Like

According to https://stackoverflow.com/questions/981655/how-to-represent-a-datetime-in-excel the underlying type is a float counting days from 1900 (with the length of the day being 1 regardless of DST or leap seconds?). I would expose this error-prone representation as is, letting the user deal with this thing.

It’s actually the number of days since Dec. 31st, 1899 (since Jan 1, 1900 is “1”). This is supposed to be a high-level library, so I’d rather not leave it in this insane and useless format (the conversion wasn’t difficult to write; the difficulty is making the output nice).

I’ll have to think about this more. It seems like float is the only option that will actually work for everyone, but I’m not sure if I’m willing to make the primary interface return unitless floats :\

Depending on how you represent the data, you could tag those floats with something like Seconds_since_epoch of float. It’s a little verbose but hard to misread!

type t =
  | Int of int
  | Float of float
  | String of string
  | Seconds_since_epoch of float
  | ...
1 Like

It’s a bit odd but absolutely not useless and rather pretty good format if you compare that the insanity you can find in other standards where you need a timezone database to figure out when things occur.

I would simply translate it to make it compatible with the the result of Unix.gettimeofday and indicate in the documentation that this is the number of s since the epoch and is compatible with the result of Unix.gettimeofday (this will e.g. let users use Unix.{gm,local}time or other libraries to convert the stamp to the calendar fields of their wishes).

1 Like