Best practices for large API changes?

For the hell of it (and because it looks like it isn’t currently being maintained) I started having a look at what it would take to update sedlex to use Uchar.t instead of int to represent unicode characters, and perhaps changing it to use @dbuenzli’s Unicode libraries instead of handling all the unicode stuff on its own.

The changes needed aren’t very big as such, but they do completely break the sedlex API. Many type signatures end up changed to turn int into Uchar.t, and there are even a couple of places where sedlex currently uses an int value of -1 to indicate end of file, which isn’t possible in Uchar.t, likely necessitating exposing an option to indicate end of file instead.

I have no idea if the maintainers would even want my changes, but lets say for sake of discussion they did. What’s the best practice for dealing with such a large change in a public library with a bunch of downstream users?

Merely adding new API calls and leaving the old seems unreasonable in this case (though if it was just a change to a couple of calls I could see deprecating the old calls and encouraging the use of new ones.) There’s no notion that I’m aware of in OCaml of versioning of interfaces, so that’s not an option.

This seems to leave:

  1. Completely breaking the downstream users of a library and hoping they update.
  2. Forking and renaming a library and its modules, and treating the updated API as a completely new thing.

(And again, most of this is theoretical. I have no idea if @alainfrisch would even care for my changes in the first place. I’m asking what the usual practice is so I can understand it.)

1 Like

In this case, I would say: Call it 2.0 and break compat. Sometimes, that’s the right thing to do. :slight_smile:

Side note: as someone with commit rights on sedlex, I would support this change.


So just to be clear, in this case, you are suggesting picking option (1), and the bump of the major version number would have human and opam effects only. Do I have that right?

Apparently I can’t have discuss messages less than 20 chars, so: Yes.


Cool, thank you. And I may submit a pull request at some point soon to sedlex. I presume that’s the preferred method?

BTW, as an aside, there seem like a couple pull requests (like one for some code better menhir integration) queued up. (I have not looked to see if either of them are sane, so don’t take this as a complaint, it may be appropriate that they’re queued up…)

I have no idea if @alainfrisch would even care for my changes in the first place

I will certainly not oppose the change, and if Drup supports it, there is a good chance it goes in.

The situation with respect to me not-supporting sedlex: I’ve written it as a toy POC for the ppx “technology” – I think it was the first publicly available ppx – but I’ve never used it for any project at all. I’d be happy to completely hand over maintenance of the project.

There’s a real need for a unicode aware lexical analyzer generator, of course, and sedlex seems to be by far the best candidate around. I’m using it for a research compiler I’m working on. I would prefer for it to have some additional features, which is why I started looking at the source.

Sadly I don’t think I’m appropriate for maintaining the thing, as I’m an OCaml newcomer and my interest may end at some point. However, I am willing to contribute some fixes if someone is in a position to examine them, give me feedback, and commit them if they’re appropriate.

Perhaps it would be appropriate to solicit and select a new “official maintainer”, especially since a good, Unicode aware lexical analyzer is of considerable use?

BTW, to me, these are the things I’d like to see happen to sedlex (and perhaps a separate post on this would be a good idea, since few interested people will be reading this thread):

  1. Switch from int to Uchar.t
  2. Adopt an externally maintained Unicode library
  3. Add a few new regular expression operators (leading and trailing context, for example.)
  4. Most ambitiously, it might be nice to add a submatch capture construct (perhaps using the “as” keyword since it’s the most natural option.)
  5. Make it easier to use sedlex with menhir with some sort of compatibility mode for menhir’s new API (I think someone requested a pullup for something like that already.)
  6. Add tests, to support the changes above.

As I suggested, would a new post about this be appropriate?

I love the way Lwt handles this (#453). They open an issue listing the planned changes along with several dedicated issues (e.g. #441) for tracking each, come up with a “recommended course of action” for each change, search for the usages of the changed or removed APIs, and notify library authors that depend on them on what they would need to do.


FYI, work in progress is here:

Note that my changes to the ppx rewriter aren’t quite baked, but it does work.