Upcoming Caqti Release - A Heads-Up about Time Zones

I am about to release Caqti 1.6.0, which will address some issue which was introduced in 1.4.0. Some of them have been addressed already, but I think one particularly unfortunate issue remains, which warrants attention for those affected. And I would also like to provide an opportunity to raise a second opinion on this before I go ahead with the release.

Let me first post the drafted release notes:

  • Set the time zone of PostgreSQL connections to UTC to mitigate an undesirable implicit conversion to the local time zone for timestamp. This issue was exposed by the specification of field types introduced in version 1.4.0. Earlier versions worked as expected, if only accidentally, since the time zone is ignored when a string is converted to a timestamp. While this change makes timestamp more usable again for storing UTC time stamps, I strongly recommend using timestamp with time zone since it’s interpretation is unambiguous. The API reference is now updated with details about how the ptime OCaml type is mapped for different database systems.
  • Drop specification of OCaml string as SQL text for PostgreSQL. This is due to issues with implicit conversions and function overloading when the desired type on the SQL side is char, varchar, or jsonb.
  • Add Caqti_type.redact to protect sensitive information from being logged.
  • Only log parameters if $CAQTI_DEBUG_PARAM is set to “true”.

The essential issue is first bullet point. The problem occurs if

  • you have already adapted to or deployed version 1.4.0 or later,
  • the time zone of your database is not UTC or equivalent,
  • you are using the timestamp aka timestamp without time zone type in SQL code, and
  • values enter this type from the OCaml client side (either via ptime or ctime).

Before 1.4.0, the time values were sent as pure stings to be interpreted by PostgreSQL, in which case the behaviour was to ignore the UTC time zone, so that the timestamp ended up as UTC. Since 1.4.0, Caqti sends the value as timestamp with time zone, which by PostgreSQL is turned into a local time when implicitly converted to timestamp. The latter seems to be the intended, or at least default, usage of timestamp according to the PostgreSQL manual, though as I’ll explain next, this makes the type not very useful. In other words, Caqti 1.4.0 turned desirable incorrect behaviour into undesirable correct behaviour.

So, why not embrace timestamp as it’s meant to be used? I found a draft of SQL 1992 which states that “datetime data types that contain time fields (TIME and TIMESTAMP) are maintained in Universal Coordinated Time (UTC), with an explicit or implied time zone part.”. So, that’s good as long as we can infer the correct time zone on the client side. But in the case of PostgreSQL, the timestamp values are stored as-is, which can be verified by creating a table, storing values, changing the time zone, and selecting. The presented value remains the same. This means that the interpretation of a timestamp as local time changes when the time zone changes, which can happen due to daylight saving time, due to transfer of a service to another site in a different time zone, or even from session to session based on what the client requests.

For that reason, Caqti 1.6.0 will set the time zone of the session to UTC to at least ensure stability if timestamp is used. This is already done for MariaDB due to a similar issue with datetime. A change like this should normally be done on a major release, though I consider this a bug fix, since the it is hard to use timestamp correctly in recent versions. But please let me know if you think setting the time zone to UTC should wait until a major version, or if we should avoid it altogether. We can also change ptime and ctime back to send strings, thus restoring the pre-1.4.0 behaviour which relied on a somewhat incorrect conversion. The difference between sending strings and setting the time zone to UTC occurs if the application relies on using local time somewhere in the SQL code.

The recommendation I draw from this, though, is to always use timestamp with time zone to represent absolute time. This has no overhead on storage compared to timestamp, as they are both 8 bytes long. The time zone is not stored in either case; the main difference I can see is whether the time zone is included in the textual representation and how conversions work.

Thanks to @pw374 for pointing out this and other issues.

8 Likes

Thank you very much @paurkedal!

We’ve updated tezos-indexer to use caqti 1.6.
I haven’t formally compared the performance against prior versions (but there’s no very obvious difference, otherwise I’d probably have postponed it until knowing more about the situation).