[AI experiment] attempt at message-templates implementation

A few years ago I made an arguably misguided request to get message-template support added to the Logs library. Ever since then I have felt like this should exist in the ecosystem despite it not being a natural fit with OCaml’s type erasure at compilation time and all.

Giving a library like this a nice user experience felt like it would require a PPX which was a hill I was not motivated enough to climb during my free time so I made no progress on adding to the ecosystem myself. I have in the past tried using AI/LLM’s to get going with this project but they were never good enough, especially at producing working OCaml code. Fast forward to today and AI agents and models have become reasonably capable so I figured I would give it another try. You can see the outcome of this latest experiment here.

Given that the library relies heavily on Obj to inspect tags at runtime I would not suggest anyone use this, at least not in its current state. I still found this to be an interesting exercise and knowing that the tooling and capabilities have improved as much as they have I would feel more comfortable reaching for these tools in the future.

People have already noted these things in other parts of the internet but I found I got the most value and quality out of it by:

  • Keeping threads around 100k tokens at most, at this point just start a new thread
  • Ensure you have the model create a detailed implementation plan before starting, improves success rate and if you keep the plans around in numbered order you end up with something similar to ADRs (Architectural Decision Records)
  • Provide as much detail and keywords as possible when writing prompts while remaining succinct

My biggest takeaways from this is probably:

  • AI is great at scaffolding a new project and getting started. No more staring at blank screen with writers block
  • It makes refactoring quick and with a strong type system you can be confident nothing obvious broke
  • You can get reasonable test coverage without tearing your hair out
  • AI rips through esoteric dune problems like nobody’s business
  • You still cannot trust it to make the right decisions.
  • The more competent the guiding hand is the faster it will be able to move and make progress

Sure, it can be useful. But your responsibility is to create hard guardrails, like in this two articles:

I needed an utility for using with LLM agents to spare context window. I have created it in a 24 hours using Codex app with Gpt 5.3 Codex in this process (each step in separated session):

  • From idea and tech stack Codex create PRD. I answer several open questions and made architectural corrections.
  • X times ask Codex to review PRD and we fixed it together.
  • Create implementation plan from PRD.
  • X times ask codex to implement next stage. In Agents.md was instruction always build, run tests (and add more tests) then commit and push after end of step.
  • Couple of times I test it by hands and ask Codex to change behavior here and there.
  • X times ask Codex to find weaknesses against what behavior should be, add test cases and fixes.
  • Ask Codex to release it.

It wasn’t vibe codind, it’s harness-engineering. Result can be found here - GitHub - s-kostyaev/oq

This is not a very big project, but same principles can be scaled.

P. S. This post was written by human (me).