A Proposal for Voluntary AI Disclosure in OCaml Code

Dear all,

I’ve put together an ocaml-ai-disclosure proposal to allow voluntary disclosure of AI usage in published OCaml code, using opam metadata and extension attributes in source code.

The repository and blog post have more details, some prototype tooling to extract attributes, and a FAQ, but in a nutshell I’m proposing something very similar to a W3C disclosure proposal for HTML.

Package Disclosures

An opam package can declare its disclosure using extension fields:

x-ai-disclosure: "ai-assisted"
x-ai-model: "claude-opus-4-6"
x-ai-provider: "Anthropic"

Note: This may just become a list of values in the final proposal, but you get the idea.

OCaml Module level

OCaml supports extension attributes, which we use via a floating attribute that applies to the entire compilation unit:

[@@@ai_disclosure "ai-generated"]
[@@@ai_model "claude-opus-4-6"]
[@@@ai_provider "Anthropic"]

let foo = ...
let bar = ...

These can also be scoped more finely via declaration attributes that apply to a single binding:

[@@@ai_disclosure "ai-assisted"]

let human_written x = ...

let ai_helper y = ...
[@@ai_disclosure "ai-generated"]

Disclosure follows a nearest-ancestor inheritance model like the W3C HTML proposal, whereby an explicit annotation overrides the inherited value.

I wrote a blog post with more details, as well as an FAQ in the proposal repository about some of the implications.

I couldn’t find any other prior art of other language ecosystems trying anything similar, so I’d be interested in hearing about any others you all know about. If there’s no interest in the wider ecosystem in doing this, then I’ll just use it myself, but I figured there’s no harm in starting the discussion!

14 Likes

I’m quite open to using this, although applying restrospectively across a codebase could be tricky.

One issue I have of that I’ve often used multiple models in my process: Is there a way of specifying that as a list?

No need to sprinkle across code; just add to the opam file and it’ll tag the whole repo as well. It’s actually just convenient to know which model/etc was used as well.

As simple as repeated attributes; the opam-ai-disclosure plugin picks that up and makes a list.

e.g. with a toplevel attribute in the opam-ai-disclosure plugin:

[@@@ai_model "claude-opus-4-6"]
[@@@ai_provider "Anthropic"]
[@@@ai_model "qwen-3.5"]
[@@@ai_provider "alibaba"]
> ./_build/default/bin/main.exe scan .
opam-ai-disclosure dev: ai-assisted (model=claude-opus-4-6, provider=Anthropic)
  Ai_disclosure (impl): ai-assisted (model=claude-opus-4-6, model=qwen-3.5, provider=Anthropic, provider=alibaba)
  Ai_disclosure (intf): ai-assisted (model=claude-opus-4-6, provider=Anthropic)
  Dune__exe__Main: ai-assisted
1 Like

I’m not sure why we want OCaml-specific machinery for something like this.

Seems like REUSE and its tooling are a better model, especially when codebases can be heterogeneous, with components shifting between languages, and use language-agnostic build tools like Make, Bazel, Buck 2, etc. (or use several build tools together).

4 Likes

Would love to use something like this, and I think having the ability to use module-level annotations is great.
That being said, I agree with @henrytill that the way packages disclose use of AI should not be dependent of OCaml or dune.

Module-level annotations should help populate a toml/json/whatever file that describes the use of AI within a codebase. (The same way the license field of an opam file doesn’t remove the need to write a LICENSE file that is independent of OCaml)

Since you bring in licensing. I’m wondering how much of that couldn’t simply be covered by the copyright holder and/or license details since that’s were all the interesting legalities are going to happen in the future.

More precisely I’d be more happy to try to frame that into a spdx license exception (WITH)

(** SPDX-License-Identifier: ISC WITH x-anthropic-claude-opus-4-6 *)
2 Likes

I like the idea of tying this to licenses. The SPDX entry is per file, which is more realistic than one per project to capture how this evolves.

That being said, I am skeptical about the overall benefit. The open source movement and its vocal license advocates have little to show for enforcing licenses when AI models have absorbed the intellectual property in a way that was unforeseen. How is that going to change?

While at the moment it’s quite easy to recognize generated code slop (no character, stupid circumvolutions, longer than needed, etc.) if that manages to evolve I’m more interested in sources being explicitly tagged as radioactive liability material than in the proper legal argument.

3 Likes

I’ve no strong opinion on that matter, but stumbled by accident onto GitHub - cursor/agent-trace: A standard format for tracing AI-generated code. · GitHub which seems related to the topic at hand. I suppose that all communities are trying to figure out a solution.

1 Like

this is already possible within the SPDX license spec (since v3):

ISC WITH AdditionRef-anthropic-claude-opus-4-6

opam supports the full spec and will not raise any warning with this syntax (since opam 2.4, or 2.2 if you build it with the latest version of the spdx_licenses library)

4 Likes

Companies are pushing out new models in short period of time. If multi models are used and different contributors use different vendor, the list can get long quickly. In-code annotation might not be ideal for the long run.

1 Like