Vibe-coding in OCaml: a bleg from a retiree

I’m retired, so all my coding is … “for fun”. I got out a bit before this new LLM thing, so luckily I haven’t had a need to learn how it works. I maintain and hack on some decent-sized projects (one of them has 11 versions of the OCaml parser, 4.10-5.5 inclusive; another has every version of the OCaml AST starting at 4.02), and sometimes I wonder if agentic coding could help make doing that hacking more efficient, faster, whatever. The problem is, I’m not going to shell out $200/no just to find out that it isn’t helpful, when none of this is paid work. And heck, part of the pleasure, part of the fun, is doing the actual hacking.

Even so, I do wonder what this AI assist is about. I see from time-to-time people talking about how they used AI to help them write OCaml code, but I don’t know what that really consists in. Was the AI helping them with really mundane BS? Or helping them with some thorny problems. I was recently hacking on the OCaml grammar (parser.mly) as part of a project, and I wondered if AI could have helped with this. It’s difficult to judge, and again, I’m reluctant to make the jump without knowing what value I could get out of it – again, since the value would have to be purely experiential, as I don’t get paid to do this.

So I wondered if there were people who recorded videos of them using vibe coding to solve problems. Maybe with a voiceover (could be them talking thru what they’re doing, not some sort of post-processed voiceover explaining after-the-fact what they’re doing).

Heck, if they were videos of people doing that with OCaml, that’d be icing on the cake, but really, in any language: C, C++, Python, Perl, SQL, whatever, really.

I confess also that when I read stuff like the head of Anthropic saying “Oh, I no longer need to write code, I can just have Claude do all my coding for me”, I smell sulfur, as in gaslighting. That’s another reason I’d like to see these videos.

Does anybody have any pointers to share? Advice?

I am a Python refugee who started learning OCaml to try and get actually good at programming and ironically to avoid AI. Then I got a part time gig doing AI work right around november when the models got very good, and I kept using it after the gig was through.

I find AI extremely productive, but it’s hard to parse through the hype, so I try to ignore most of it. I really like using OCaml with AI. The compiler is well suited to it, I find. Go is probably a stronger language to vibe code in because of its large data set, but I think OCaml is better simply because of typing and pattern matching.

I use Synthetic and Neuralwatt as providers. Between the two I’m able to get some pretty heavy vibecoding use for 50/mo. Word on the street is that synthetic is profitable. I like the Pi harness but OpenCode is good.

I have like 40 abandoned projects, I’d say about 5 I actively use for coding, and two projects which get a lot of use from friends, family and coworkers. One is a simple calculator app, and the other is a pretty basic SSG which dumps media files in a S3 bucket. Nothing revolutionary, but stuff which is niche enough that I would not have hand coded it or finished the side project. A lot of those abandoned projects were iterations to build the tooling that I’m currently using.

I still would like to get better at handcoding OCaml. I did one Dream app by hand. I feel like I’m still learning FP concepts.

I have a sense that there may be some real utility to be found by using something like Coq to create a formally verifiable software core which can then allow for vibe coded agentic patterns on top.

I also kind of have a sense that OCaml effects may allow for a Erlang type “let it fail” code architecture which could be very productive with AI. But I don’t know much about either of those domains so that may very well just be the AI psychosis talking.

All in all, I’d say give it a shot. I think the hardest part in staying productive is not derailing yourself to build 7 new code harnesses.

I could probably record a video if you think that’d be helpful. I have a very goofy workflow though so I may not be the best demo.

I don’t do videos, but I did write up several blog posts about my December of agentic coding in OCaml here so I could learn what it’s all about.

Oh, I had somehow not seen (or didn’t remember) this. I’ll read it in detail! Thank you, Anil!

I sent a direct message, but I thought I should say that I’m not looking for someone to record a video – that would be -work-, and if anybody’s going to do -work-, it should be me. I’m really just looking for some existing set of videos or other things. Anil’s blog posts are a good start, and haha, they’re even for OCaml!

Try this:

  1. Pick a small fun idea you haven’t had time to build. Or energy.
  2. Open up Claude, codex, whatever agent. Preferably one of the new frontiers one.
  3. Pick the best most expensive model. like opus 4.8. Use the default effort.
  4. Always start in plan mode. That’s a shift + tab to switch modes. Design and iterate. Be more goal oriented first. Prompt the agent to give you some options with trade offs. Prompt it to ask you questions. Have fun and iterate until the plan looks decent and can achieve your goal.
  5. Prompt it to break apart into smaller commits too or legs of work if you want.
  6. Approve the plan. Let it run.
  7. Prompt agent to review your code.

That’s the gist of my workflow. Apply your favorite programming techniques as you go. Smaller work for each commits. Test driven development.

The key to all this is getting your environment setup where the agent can have a tight feedback loop as it can course correct quickly without writing gibberish. Ask it to write tests in between work. Compile and run linter. Formatters. Refactor.

Unsure? Ask it for suggestions.

I can now build so much more tooling and do all the polishing that I wish I could’ve done before in all my work.

I have multiple agents review code and have them fix - I still mold the code into good designs. I still review, but not every line anymore. I still write tests and mull on trade-offs, plan timelines, but now I can do so much more in parallel.

I’ve been building a new Ruby typechecker in Rust.

There’s a real opportunity here to flesh out missing pieces of the OCaml ecosystem for those who have the time (and tokens) to spare. AI has gotten to the level that it does serious programming quickly and well. Most programmers no longer program by hand, but rather make design decisions and review code.

Personally, I program in OCaml for fun and I enjoy doing things without AI, but for example, I started an AI-based SDL3 package (which I need to get back to and complete).

I think if we can prioritize missing pieces in the ecosystem (I’ve tried to do it on ocamlverse here, but it’s now out of date), AI can really take OCaml to the next level.

Here is a one-pager explaining my workflow (5 minute read): 4ward/docs/AI_WORKFLOW.md at main · smolkaj/4ward · GitHub

Key insight (due to Peter Steinberger): use the Socratic method! Why is explained in the doc.

Beyond that, the single most important thing is to put in place deterministic guardrails, I.e CI: linters, formatters, compilers, unit test, end to end test, … is there perhaps an existing implementation you can diff-test against? Can you apply property based testing? Fuzzing? Is there a specification you can use as a basis to generate a test corpus?

Here is my AGENTS.md from a mature, large project that was entirely written by AI: 4ward/AGENTS.md at main · smolkaj/4ward · GitHub

That said, I’d strongly suggest starting from an empty AGENTS.md and have the agent add things as you go to tweak agent behavior.

Mostly agree, except I personally hate planning mode. I prefer to just have a natural conversation with the agent.

The problem with planning mode is that it has the agent ask you questions. I prefer to steer the agent by asking THEM questions. It’s more efficient and forces the agent to understand the why, not just the what. That way they can course correct and reason much more independently.

DISCLAIMER: This insight is not my own, it’s due to Peter Steinberger.

By the way @Chet_Murthy: what you are asking about is very likely not “vibe coding”, but “agentic engineering”. Both terms were coined by Andrey Karparthy: https://x.com/karpathy/status/2019137879310836075?s=20

I have used it on a few projects, both for planning (smaws) and full blown implementation over dozens of iterations (ocgtk). I’ve found that both frontier models and cheaper ones are both fully capable of writing Ocaml syntax and planning using most of its features (including GADTs and functors, but moreso when pushed to use advanced features rather than spontaneously).

I tend to start with a relatively small AGENTs.md for most projects (and keep it that way as it will be unhelpful at lengths >150 lines).

The biggest issue I encounter repeatedly is pathological coding patterns it favours when solving problems, such as deeply nested match statements, use of polymorphic equality operators and unsafe (exception throwing) standard library functions. It requires both explicit guidelines and mechanical code checks to correct these as they can get out of hand quickly and coding agents will preference existing code as their guidelines vs instructions (the former overwhelms the context if not strictly managed).

All the advice above about planning first and using review gates is good. I favour larger models for planning, and find good results using a secondary different model for reviewing plans (eg Opus or Somnet for writing a plan and Gemini ash for review). Interrogate the plan yourself, ask the ai to explain parts that make no sense (Anthroppic models are particularly fond of plans that only make sense to the agent planning it). Always ask it to ask you questions when writing the plan as it will favour its own decision making otherwise, even for areas that are highly ambiguous.

Most implementation can be done with cheaper models, but the amount of code they can produce in a 2 hour sitting is challenging to review carefully by hand. I use multiple review agents, preferably focussed on one aspect of the code (compliance to spec, code guidelines, safety behavior) and always of a different model (using a frontier model for at least one review step sadly still seems to be necessary). I try and do this before i review the code myself to reduce the amount of nonsense I’m forcing myself to wade through.

In terms of mechanical gates: ensure your executors always build, test and format your code as part of their cycle (these go in your AGENTS.md). always run the result yourself - tested but broken outputs is very common. I’m starting to see good results with semgrep to filter out the pathological behaviours i describe above)

I very much do ask the agent questions early on in planning mode too! The main reason I use plan mode is because I don’t want the agent to start writing code until I’m ready.

Which models do you typically use? If I don’t have to use planning to avoid a trigger happy agent, then I won’t. One less key to hit!

I’m still on Ciaude Opus 4.6. I haven’t had issues with it being overeager like some other models :slight_smile:

I find AI tools more useful to crank out code that I don’t really want to write otherwise. There’s a handful of Android apps that I use regularly, some paid, and they don’t do quite what I want.

I could sit down for a month or two and write apps for Android, but I find coding for that pretty tedious and frustrating. So it never happens. But with Claude Code I can have a replacement for an app ready in two hours. And it’s customized to do exactly what I want (like not violate my privacy).

I don’t look at the code they produce at all. I think it’s mostly Java. In one case I had it vendor and build ffmpeg instead of using the system media decoder (I wanted to add realtime EQ). I thought that was hilarious since it would have taken me like a month of fiddling just to get the build incantations right and wire Java up to be able to call into the ffmpeg C ABI.

I could release the apps, but why? I’m a market of one. And if anyone really wants it they can just bespoke make their own.

For OCaml it’s a bit different. I usually recruit them to help me with more tedious stuff, like I mentioned in the awso announcement: refactors and evaluating how to reduce the dependency cone, for example. In those cases I look at every line and often do fix them up to meet my standards.