Modelling event-based system in OCaml

Hi everyone,

I’m using OCaml for a personal project (as personal projects should be fun, and I find OCaml fun). Recently I’ve been pulling hairs over a “similar events” design though. I’ve tried different approaches (e.g. using tuples, variants, polymorphic variants, records, recently looking into GADTs etc.) but none doesn’t feel correct and I encounter same issues over and over.

Even though I prefer to solve my own puzzles, I’m out of ideas and steam and decided to look for guidance :slight_smile:

Problem context

I want to develop event-based system. Initial prototype has ~30 events of approx. 5 families. They are quite similar to each other:

  • Every single one has ID and Timestamp (with possible extra identifiers coming later)
  • Every single one has a Metadata and Event Data
  • Event Data is event-specific - it will be subject to a typing per event, but I believe it’s not relevant for the problem itself so I will stub it
  • Meta Data can be empty or hold data comparable to other events (e.g. UserJoin and SessionNew have session element that point to same Session entity)
  • Some events can have almost same shape and structure but shouldn’t be treated nor processed as the same (stumbled upon it when modelling as plain tuples) - e.g.
  • Metadata is the most important part and can either be of a single data group or a product of different

Ultimately those events will be coming from Database or API in JSON/Sexp form and serialized/deserialized. I expect cardinality to be finite (initial version maybe around 50-60) but constantly expanding.

Example in OCaml-ish (not sure if valid written ad hoc - something close to one of the attempts) should give a rough idea:

type event_id = EventID of int
type event_ts = EventTS of int
type event_common = { id: event_id; ts: event_ts }
type event_data = EventData
type session = SessionData
type user = UserData
type personal_config = PersonalConfig
type meta_empty = EmptyMeta
type meta_with_user = { user: user }
type meta_with_session = { session: session }
type meta_with_user_and_personal_config = { config: personal_config }
type meta_with_user_and_session = { user: user; session: session }
type meta_with_user_and_session_and_personal_config = { user: user; session: session; config: pesonal_config  }

(* More meta + different sets of meta data *)

(* Some example events *)
type user_event =
  | UserNew of (event_common * meta_empty * event_data)
  | UserLogin of (event_common * meta_with_user * event_data)
  | UserJoin of (event_common * meta_with_user_and_session * event_data)

type meta_event = 
  | Tick of (event_common * meta_empty * event_data)
  | UpdateEvent of (event_common * meta_empty * event_data)

type session_event = 
  | SessionNew of (event_common * meta_with_session * event_data)
  | SessionExport of (event_common * meta_with_user_and_session * event_data)

(* etc. *)

Problem itself

However I model I crash against same set of issues

  • Grabbing ID from Event is difficult
    • e.g. get_id require implementation for everything or I can’t get types correctly
  • It’s difficult to modelling and grabbing subset of data
    • for get_user in above example I could mash all possible types, but adding new subset, for example available_credits might explode other types and add to the boilerplate code
    • or I get a some long chain of boxing like: BusinessEvent WithUserData UserJoin ...
  • Lack of ability to have a different groupings - e.g. SessionExport and UserJoin both have session and thus could be subject to get_session function

I’m also aiming at following design goals:

  • I want to be able to type long chain Event progression, i.e. make sure that in chain there is no UserLeft before UserJoined
  • More importantly have a single place to inspect (or type) all possible Event rules (e.g. processor after seeing UserNew should spawn IssueTrialPeriod)
  • Bag all of those in a list and process in multi-phase (e.g. UserNew, Tick, Tick, UserJoin) would be processed in two passes - first pass would remove all Tick events (preprocessing phase) and the second would receive only UserNew, UserJoin
  • Need to ensure that no Event exists that cannot be handled by existing system

I understand that’s a lot, so I’d like to mention that I’m not asking for a working code nor a working solution, but I’m struggling for a long time and would be grateful for any finger pointing at a solution /similar conundrum discussion or (I don’t exclude) thought/assumption error?

Hello! Did you consider using polymorphism to share the ID and Timestamp accross all event kinds?

type 'a event = { id: event_id; ts: event_ts; kind: 'a }

type user =
  | UserNew of ...
  | UserLogin of meta_with_user * ...
  | UserJoin of meta_with_user_and_session * ...

type user_event = user event

let get_id event = event.id

Then, you could like into polymorphic variants to typecheck functions that only apply to some kinds of events:

type user =
  [ `UserNew of meta_empty * event_data
  | `UserLogin of meta_with_user * event_data
  | `UserJoin of meta_with_user_and_session * event_data
  ]

type meta = 
  [ `Tick of meta_empty * event_data
  | `UpdateEvent of meta_empty * event_data
  ]

type session =
  [ `SessionNew of meta_with_session * event_data
  | `SessionExport of meta_with_user_and_session * event_data
  ]

type any = [ user | meta | session ] (* a useful alias *)

let get_session event = match event.kind with
  | `UserJoin (meta, _) | `SessionExport (meta, _) ->
      (meta : meta_with_user_and_session).session

Typechecking a sequence of events for business-specific properties is generally not worth it, because the interesting bits are too dynamic (e.g. UserJoin "someone" :: UserLeft "someone else" :: ... won’t be catched). @dinosaure has a nice blog post on GADTs and state machine if you still want to try :slight_smile:

3 Likes

I actually decided to try GADTs because of this blog post and will admit that idea whispers to me, but I only tried couple implementations and GADTs still too magical for me.

Regarding chain verification it not might be worth for this particular project but I’m working on an complex, distributed event system professionaly and having a tool that solves it would be a godsend (and possibly trigger swift OCaml adoption :wink: )

But I agree that it might be too much for a type system, I recently read about F* and might turn there for the reasoning part, those are another areas of my research and I’d like to settle for some proofing system.

As for the approaches:

I did something similar, but investigating your code closely there are some subtleties I might have missed. The actual paste (in this iteration event shared is called event_data):

type ('meta, 'data) event = { event : event_data; meta : 'meta; data : 'data }
type user_event =
  | Joined of (user_session, unit) event
  | Left of (user_session, unit) event
  | Input of (user_session, string) event

…and then I also tried (tupled):

(* ... *)
type user_event =
  [ `UserJoin of UserSessionEvent.t
  | `UserLeft of UserSessionEvent.t
  | `UserInput of UserSessionEvent.t * UserInputT.t ]
[@@deriving show]

(* ... *) 
type business_event =
  [ user_event | session_event | session_render_event | user_render_event ]
[@@deriving show]

Today morning I woke up with 2 new thoughts. One would be to eat the bullet and write all the variants for extracting common parts. Even if I have 1000 events I could macro variant selector in editor and forget about trying to be smart.

Second was about making a “God Record” event spec and just match on it for most of the functions. While scope might grow I don’t see a scenario where there’s more than >20 metadata sets (and in fact I hardly can think about more than 5), so maybe the best road would be:

type event_god_record = {
  event : event_data;
  user : user_data option;
  session : session_data option;
  config : config_data option;
}

let get_id e : event_id = e.event.id
let get_user_id : event_god_record -> user_id option = function
  | { user = Some user_data; _} -> Some user_data.id
  | _ -> None

With some addition of event tag and maybe event families list in the shared event_data struct for further matching.

I think I’ve been able to nail it:

type event_id = EventID of int
type user_id = UserID of int
type session_id = SessionID of int
type ts = Timestamp of int
type event_data = { id : event_id; ts : ts }
type user_data = { id : user_id }
type session_data = { id : session_id }
type config_data = private { config : int }

type 'a omnievent = {
  typ : 'a;
  event : event_data;
  user : user_data option;
  session : session_data option;
  config : config_data option;
}

type user_event = [ `UserJoin | `UserNew ]
type meta_event = [ `Tick ]
type session_event = [ `SessionStart | `UserJoin ]
type any_event = [ user_event | session_event | meta_event ]

let make (type typ) typ ?user ?session ?config event : typ omnievent =
  { typ; event; user; session; config }

let get_id (e : any_event omnievent) : event_id = e.event.id

let get_user_id : user_event omnievent -> user_id option = function
  | { user = Some user_data; _ } -> Some user_data.id
  | _ -> None

let get_session_id : session_event omnievent -> session_id option =
  function
  | { session = Some session; _ } -> Some session.id
  | _ -> None

let tick_event = make `Tick { id = EventID 1; ts = 1 }

let user_join_event =
  make `UserJoin ~session:{ id = SessionID 1 } { id = EventID 1; ts = 1 }

let user_new_event = make `UserNew { id = EventID 1; ts = 1 }

let session_start_event =
  make `SessionStart ~session:{ id = SessionID 1 } { id = EventID 1; ts = 1 }

let tick_event_id : event_id = get_id tick_event
let user_join_event_id : event_id = get_id user_join_event
let user_new_event_id : event_id = get_id user_new_event
let session_start_event_id : event_id = get_id session_start_event
let user_join_user_id : user_id option = get_user_id user_join_event
let user_new_user_id : user_id option = get_user_id user_new_event

(*(* Compile error  *)
let tick_event_user_id : user_id option = get_user_id tick_event
let session_start_user_id : user_id option = get_user_id session_start_event
*)

let user_join_session_id : session_id option = get_session_id user_join_event

let session_start_session_id : session_id option =
  get_session_id session_start_event
(*(* Error - doesn't have session data *)
let tick_session_id : session_id = get_session_id tick_event
let user_new_session_id : session_id option = get_session_id user_new_event
*)

It still needs some adjustments, as I would like to ensure sanity when creating events (which make me end up breaking omnievent into smaller bits, but well), but at least I’m getting there.

Even though I’ll try to make this version work If anyone wants to comment or criticize feel free to do so :slight_smile:

2 Likes

You might also want to look into phantom types using polymorphic variants to typecheck correct sequences of actions - I find this method much easier to read than GADTs: Behavioural types · KC Sivaramakrishnan

Otoh. you are the one guaranteeing that the implementation behind the phantom-type interface matches semantically - but I havn’t yet found that to be a problem.

A point that is important concerning the comparison of GADTs and phantom-types is that as many people find GADTs “magical” / hard-to-read - they are not able to really understand what behaviour the GADTs describe. So if phantom-types are easier to read for more people - the code is in effect more safe, even though GADTs give you more guarantees automatically. A guarantee only makes sense if people understand that guarantee.

2 Likes

Took me some time to circle and test few more ideas, but even though I got some of the results I kept finding issues with it. Behavioural types is still something I’d like to explore, but I believe that @art-w solution is the best approach for the problem outside of pre-processing.

It was quite an educative journey :slight_smile:

1 Like