Ocaml-ts-mode - Emacs ocaml major mode using tree sitter

The LSP features are always kind of orthogonal to what major modes provide in Emacs, as already demonstrated for OCaml by merlin-mode in the past and GitHub - tarides/ocaml-eglot: An overlay on Eglot for editing OCaml code using LSP more recently.

For me the most important aspects of an OCaml major mode are font-locking, indentation, structural navigation (within the same file, as that what TS can help us with) and there the nice to have things like some basic utilities like find other file, project building/running (although those you can get from Projectile) and potentially a top-level integration.

1 Like

Small update:

I’ve started hacking together the new Emacs mode by pulling ideas from left and right. Right now it does little besides installing the grammars automatically and font-locking the code in a crude way, but I hope that once the foundation becomes more solid it will become easy to extend it. The code is here GitHub - bbatsov/neocaml: A modern, TreeSitter-powered, Emacs major mode for OCaml Naming is hard, but I am quite pleased with the neocaml name, given what it’s supposed to represent, plus there way too many something-ts-modes out there already. :smiley: I consider this mostly an experimental project at this point. I hope I’ll get it to the state where we discussed, but I can’t make any promises. Down the road, depending on how far I (we?) get we can discuss renaming it, or adding to tuareg.

The most painful thing in the development right now is that it’s hard to test the effect of the TreeSitter queries, and there some OCaml support in combobulate will be priceless!

2 Likes

Right now tree-sitter-ocaml parses ocamldoc comments just as comments. I started working on a tree-sitter grammar for ocamldoc two years ago, but then forgot about it. So I have a half finished tree-sitter-ocamldoc lying around that I can revive, and once that is finished, supporting multiple tree-sitter languages in emacs doesn’t seem that complicated.

Indenting should be a lot easier than formatting. Neovim does it with tree-sitter queries and has an OCaml indents file, but I don’t know how good it is. In theory, it shouldn’t be that difficult to get comparable results to ocp-indent with that approach, but I don’t know how easily it can be implemented in emacs.

tree-sitter-ocaml has separate grammars for .ml and .mli files because they have a different and incompatible syntax. Originally there was only a single grammar, but keeping them combined created a lot of parsing conflicts (mostly because of the include statement), and the workarounds resulted in an inaccurate syntax tree that was difficult to query. Since the split, tree-sitter can make use of the fact that OCaml is an LR(1) language, and can be parsed without having to resolve conflicts (but that will change with the upcoming labeled tuples).

If the multiple grammars are creating problems in emacs, maybe it can help to check how typescript is handling it. Typescript has separate tree-sitter grammars for .ts and .tsx files, and a single LSP server for both of them (and javascript).

3 Likes

I’ve looked at the Emacs support for this and it’s not complicated indeed.

Yeah, I’m optimistic that the basic indenting wouldn’t be very hard. I’ve noticed in the Emacs docs there 2 types of indentation logic that can be specified in the Elisp TreeSitter API, I just hasn’t gotten to that part yet, as my initial focus was to try to understand the font-locking API and get something working there. I do a lot of work with Ruby linters, and one thing I’m missing right now is some documentation of the TS OCaml AST format (e.g. like parser/doc/AST_FORMAT.md at master Ā· whitequark/parser Ā· GitHub), as currently I’m just figuring out what is what by looking at the AST of random OCaml files and that’s not the most effective way to work. If someone knows if there are similar docs - please point me in this direction.

Yeah, I figured as much. Both older Emacs modes have taken the approach to treat the two grammars more or less the same, but I’m guessing this won’t work well in practice. Again, it would be nice to be able to see what the differences are, to be able to devise how to handle this in practice - if the differences are very small I can transform the OCaml rules for OCaml Interface, but if they are bigger I’d probably keep them separate, even if this is going to result is some amount of duplication.

I’ve also noticed there’s a third grammar called ā€œOCaml Typesā€ - what’s this about?

Feel free to open a pull request back if you think the changes should go back into the mainline.

FYI - turned out that getting some basic indentation wasn’t that complex indeed. I’ve tried adapting the nvim indent-queries for Emacs and things mostly work, but a 1:1 translation is tricky.

In case someone’s curious about the indentation logic - Implement some basic (and somewhat buggy) indentation logic Ā· bbatsov/neocaml@4f5c2ad Ā· GitHub

1 Like

There is no documentation of the tree-sitter node types unfortunately. There is the node-types.json file that could in theory be used to generate some documentation, but as far as I know nobody has done that.

You can check the test corpus to get examples of most constructs. Apart from everything class-related, it should be pretty complete.

For highlighting I recommend to use the highlight queries as basis.

The ocaml and ocaml_interface grammars are the same grammar with a different entrypoint. The ocaml_interface is the same as what can be between sig ... end in the ocaml grammar. All indentation and highlighting logic should be the same.

The ocaml_type grammar was added to be used in editors for the type on hover popup. Not sure if that can be useful for emacs at some point, but probably not for the first version. There will be no files using the ocaml_type grammar.

Nice. I don’t think the neovim indentation has been used a lot by experienced OCaml programmers, so it would be good to get some feedback from people with opinions. But now that there is a working version, tweaking it should be easy.

1 Like

Very helpful pointers! (might be a good idea to mention some of those things in the grammar repo’s README) Thanks!

I was wondering, would it be technically possible to write a good part of a new emacs extension for OCaml, in OCaml, using ecaml? (If Jane Street is willing to fund some development effort, I am imagining that this question maybe came up before). I don’t remember where, but I recall reading something about how the ocaml-vscode extension was mainly written in OCaml and that be a help to encourage OCaml devs contributors. I would be interested in your thoughts on this.

Given the that Elisp APIs are not very complex I’m not sure what would be the benefit of using ecaml. Part of the allure of Emacs (for me, at least) is that it’s super easy to develop extensions incrementally, test your changes on the fly and leverage the amazing built-in Elisp debugger.

For me there’s also the fact that I know Elisp a lot better than I know OCaml. :sweat_smile:

1 Like

I agree that if you compare the calls to the Emacs APIs, the difference is mostly syntactic. For example:


(erase-buffer)

compared to:


Ecaml.Current_buffer.erase ()

In that sense, the Ecaml project has this counter-intuitive nature, where what is visible of it is the part that is not interesting.

Another way to say this, is that if you think of an extension structured as an MVC, I’d expect the Ecaml API to be used mainly in the V part, whereas I’d expect the benefits of the OCaml language to shine mostly in the M and C parts.

I sincerely hope my response doesn’t come across as lecturing in any way. I just mean to give you more context as to where I am coming from, and what I am thinking.

Also, now would be a good time for me to say that I don’t know what I am talking about. I am new to Ecaml and have only recently started working on Elisp extensions as part of a hobby project. Initially, I felt good about seeing things work as I wanted. However, a few hundred lines in, I started missing types and typed pattern matching in the state management part of the code.

I appreciate your perspective, given your experience with Elisp, and I am very thankful to you for engaging in this discussion. I look forward to learning more about Elisp and watching further development on neocaml.

I get your point, and I’m guessing that if an Emacs extension is doing more complex things there might be some benefit to implement it in a language that gives you more in terms of safety or library access, but in the case of major modes where the bulk of the code is just configuring various Emacs frameworks (font-locking, indentation, expression boundaries for navigation, imenu, etc), I think you’d better off with the native APIs as they’ll allow you to iterate faster. I’d love for Emacs Lisp to be able to community better the required shape of some nested data structures (as I constantly make such mistakes), but I’ve learned to live with such limitations.

1 Like

@bbatsov do you happen to have a good reference on learning elisp for someone who knows used to program in common lisp and docs on how to navigate emacs’ APIs and background information on the application model it exposes ?

Sometimes I fancy doing stuff (mostly in relationship to odig) but despite the idea that ā€œthis should just be lisp after allā€, the rare things I ever managed to write in elisp were tiny and took an unreasonable amount of time to achieve.