A Proposal for Voluntary AI Disclosure in OCaml Code

avsm · April 3, 2026, 5:18pm

Dear all,

I’ve put together an ocaml-ai-disclosure proposal to allow voluntary disclosure of AI usage in published OCaml code, using opam metadata and extension attributes in source code.

The repository and blog post have more details, some prototype tooling to extract attributes, and a FAQ, but in a nutshell I’m proposing something very similar to a W3C disclosure proposal for HTML.

Package Disclosures

An opam package can declare its disclosure using extension fields:

x-ai-disclosure: "ai-assisted"
x-ai-model: "claude-opus-4-6"
x-ai-provider: "Anthropic"

Note: This may just become a list of values in the final proposal, but you get the idea.

OCaml Module level

OCaml supports extension attributes, which we use via a floating attribute that applies to the entire compilation unit:

[@@@ai_disclosure "ai-generated"]
[@@@ai_model "claude-opus-4-6"]
[@@@ai_provider "Anthropic"]

let foo = ...
let bar = ...

These can also be scoped more finely via declaration attributes that apply to a single binding:

[@@@ai_disclosure "ai-assisted"]

let human_written x = ...

let ai_helper y = ...
[@@ai_disclosure "ai-generated"]

Disclosure follows a nearest-ancestor inheritance model like the W3C HTML proposal, whereby an explicit annotation overrides the inherited value.

I wrote a blog post with more details, as well as an FAQ in the proposal repository about some of the implications.

I couldn’t find any other prior art of other language ecosystems trying anything similar, so I’d be interested in hearing about any others you all know about. If there’s no interest in the wider ecosystem in doing this, then I’ll just use it myself, but I figured there’s no harm in starting the discussion!

chris-armstrong · April 3, 2026, 10:11pm

I’m quite open to using this, although applying restrospectively across a codebase could be tricky.

One issue I have of that I’ve often used multiple models in my process: Is there a way of specifying that as a list?

avsm · April 4, 2026, 7:02am

No need to sprinkle across code; just add to the opam file and it’ll tag the whole repo as well. It’s actually just convenient to know which model/etc was used as well.

As simple as repeated attributes; the opam-ai-disclosure plugin picks that up and makes a list.

e.g. with a toplevel attribute in the opam-ai-disclosure plugin:

[@@@ai_model "claude-opus-4-6"]
[@@@ai_provider "Anthropic"]
[@@@ai_model "qwen-3.5"]
[@@@ai_provider "alibaba"]

> ./_build/default/bin/main.exe scan .
opam-ai-disclosure dev: ai-assisted (model=claude-opus-4-6, provider=Anthropic)
  Ai_disclosure (impl): ai-assisted (model=claude-opus-4-6, model=qwen-3.5, provider=Anthropic, provider=alibaba)
  Ai_disclosure (intf): ai-assisted (model=claude-opus-4-6, provider=Anthropic)
  Dune__exe__Main: ai-assisted

henrytill · April 6, 2026, 1:02am

I’m not sure why we want OCaml-specific machinery for something like this.

Seems like REUSE and its tooling are a better model, especially when codebases can be heterogeneous, with components shifting between languages, and use language-agnostic build tools like Make, Bazel, Buck 2, etc. (or use several build tools together).

giltho · April 7, 2026, 1:46pm

Would love to use something like this, and I think having the ability to use module-level annotations is great.
That being said, I agree with @henrytill that the way packages disclose use of AI should not be dependent of OCaml or dune.

Module-level annotations should help populate a toml/json/whatever file that describes the use of AI within a codebase. (The same way the license field of an opam file doesn’t remove the need to write a LICENSE file that is independent of OCaml)

dbuenzli · April 7, 2026, 2:15pm

Since you bring in licensing. I’m wondering how much of that couldn’t simply be covered by the copyright holder and/or license details since that’s were all the interesting legalities are going to happen in the future.

More precisely I’d be more happy to try to frame that into a spdx license exception (WITH)

(** SPDX-License-Identifier: ISC WITH x-anthropic-claude-opus-4-6 *)

lindig · April 7, 2026, 8:53pm

I like the idea of tying this to licenses. The SPDX entry is per file, which is more realistic than one per project to capture how this evolves.

That being said, I am skeptical about the overall benefit. The open source movement and its vocal license advocates have little to show for enforcing licenses when AI models have absorbed the intellectual property in a way that was unforeseen. How is that going to change?

dbuenzli · April 7, 2026, 9:22pm

While at the moment it’s quite easy to recognize generated code slop (no character, stupid circumvolutions, longer than needed, etc.) if that manages to evolve I’m more interested in sources being explicitly tagged as radioactive liability material than in the proper legal argument.

Khady · April 8, 2026, 3:28am

I’ve no strong opinion on that matter, but stumbled by accident onto GitHub - cursor/agent-trace: A standard format for tracing AI-generated code. · GitHub which seems related to the topic at hand. I suppose that all communities are trying to figure out a solution.

kit-ty-kate · April 8, 2026, 9:22am

this is already possible within the SPDX license spec (since v3):

ISC WITH AdditionRef-anthropic-claude-opus-4-6

opam supports the full spec and will not raise any warning with this syntax (since opam 2.4, or 2.2 if you build it with the latest version of the spdx_licenses library)

bn-d · April 9, 2026, 5:01am

Companies are pushing out new models in short period of time. If multi models are used and different contributors use different vendor, the list can get long quickly. In-code annotation might not be ideal for the long run.

hyphenrf · May 7, 2026, 8:40pm

I am a firm believer that outright refusal should be considered a viable option for dealing with llm output in the ecosystem, say for example in the opam repo.

Although as far as I understand there was no precedent of opinionated decisions in the repo, and a repo is quite different from a single project, even a language implementation.

Still, worth considering by the maintainers because the arguments against it do transfer from single projects just fine. And in that sense there is prior art: zig(1, 2), clojure, binutils, qemu, gentoo, elementary, netbsd, postmarket-os, redox-os, servo, SDL(1, discussion), forgejo, …

lindig · May 8, 2026, 8:55am

If this relies on voluntary declaration rather than detection, this is a losing battle when it can’t be enforced or even decided. The spectrum of what constitutes LLM output is too wide and the potential for AI in software development too exciting.

hyphenrf · May 8, 2026, 11:17am

This argument, that it is impossible to practically enforce, is often made against LLM-content bans and you will see versions of it in the discussions I linked. In my opinion, it misses the point. The point isn’t technical, it’s sociological. That is to lay expectations of conduct in the community.

There are sufficient responses for it up in those links but I will attempt my own here, for the interest of discussion.

I implore you to consider: much like voluntary disclosure or just regular licensed software contributions or you locking your door when leaving your dwelling space unattended, rely on trust boundaries and majority good faith – that the actor isn’t doing subversive behavior to avoid disclosure or plagiarize different-licensed code or use a weirdly-bent hairpin and easily bypass your lock, respectively – much the same a LLM-ban policy would set a boundary and assume good faith.

A ban would set playing rules for those interested in playing fair. And that is the majority of our community.

It would go further than disclosure, however, at standing guard against low-effort contributions. Someone trying to subvert the rule would, ironically, have to go in the contribution they made and put effort to make it look like it was the work of a human, the same human submitting it, and that this submitter understands the output when challenged on it. This is effective enough at stopping a large amount of slop from ever having to be dealt with.

It also gives the maintainers a stronger liability shield and empowers them to effortlessly reject thankless work, really. At least output used to somewhat correlate to effort when it was authentic and not generated.

the potential for AI in software development too exciting

Exciting as well was the potential for plastics in the 90s. As Bender would say.

hyphenrf · May 8, 2026, 11:41am

Perhaps if there was stronger more widespread refusal of plastics at the time, more ecologically viable alternatives would’ve been developed much sooner. bypassing today’s ecological crises entirely.

There may be anxiety for catching on “relevance” and “progress” that motivates a looser grip on LLM content, but my impression of the OCaml community was that we’ve always been steady and forward-looking. Valuing high-quality solutions (technical or otherwise) over moving fast and breaking things.

c-cube · May 8, 2026, 1:59pm

None of these communities try to enforce a full LLM-generated code on the whole ecosystem. The way the opam-repo works is sometimes opinionated, but is it its role to enforce such decisions on everyone?

hyphenrf · May 8, 2026, 2:12pm

I did point out it’s unprecedented but worth considering nonetheless.

You could think of the opam-repo as a project, not a registry of the “whole ecosystem” but a (blessed, central, community-maintained) set of packages which participants in the ecosystem can choose to submit their work to and the repo maintainers can choose to reject for any reason.

opam is designed such that this central repo is a convenience not a requirement (opam pin, opam repo, etc.). That means it is absolutely plausible for opam-repo maintainers to be opinionated and reflect an “official” position. Neutrality shouldn’t be pushed on upstream as a non-negotiable, because the tool itself affords you choice to expand on upstream trivially.

The closest analogy to opam’s function in “the ecosystem” and the proposed stance would be gentoo’s position

Topic		Replies	Views
In 2026, is the average OCaml hacker AI-augmented? Community chatgpt , llm , ai , claude , llama	15	946	April 2, 2026
A next-generation IDE for OCaml Ecosystem ide , announce	54	8052	August 23, 2025
OCaml Community Code of Conduct Community	78	5694	November 28, 2022
Building OCaml MCP - what features would you want? Community	11	891	July 8, 2025
I want to publish my package to opam , then i got blocked my github account? Community opam	40	2162	July 1, 2022

A Proposal for Voluntary AI Disclosure in OCaml Code

Package Disclosures

OCaml Module level

Related topics