Ocaml-ts-mode - Emacs ocaml major mode using tree sitter

This weekend I got interested in emacs major modes based on treesitter, so I decided to implement ocaml-ts-mode. The code can be found here:

It was actually very easy thanks to the hard work others have done in creating the treesitter grammar.

It supports:

  • Syntax highlighting of .ml and .mli files.
  • Initial indentation support (this is probably currently broken in multiple places)
  • C-c C-a - Switch between .ml and .mli files for a module.

This is less feature rich than Tuareg mode, so I don’t know if it replaces it. If anything, maybe it could be integrated into tuareg mode.

I have no experience writing elisp so this probably has obvious mistakes. Contributions welcome!

Enjoy!

5 Likes

Whoops, in my excitement learning about treesitter I didn’t notice that there already is an ocaml-ts-mode. That one is probably the preferred one to use. I’ll probably direct any of my development towards it. But learning how to make a major mode was a fun experience.

3 Likes

I’m not sure which mode you’re referring to, but I don’t think that there’s any ocaml-ts-mode with active development and any meaningful adoption yet. Perhaps yours will be the first one!

This is the mode I guess : GitHub - dmitrig/ocaml-ts-mode: OCaml major mode using tree-sitter

Unfortunately it’s a WIP without any development in 6 months.
Any competent people to help?

I’ve been pushing a few changes here and there for my ocaml-ts-mode. Some known issues:

  1. paragraph fill does not work correctly.
  2. the language mode indentation totally doesn’t work
  3. does’t work with hideshow mode, which would be nice. Part of that is just Ocaml isn’t a language very friendly to hideshow mode.

Other than that for the most part it suits my needs but happy to take PRs. I want to fix the fill paragraph for sure at some point. I use ocamlformat which mostly alleviates issues (1) and (2).

Apologies for necro-ing this thread. But …

I spent time this weekend hacking on ocaml-ts-mode to update it with the latest treesitter ocaml grammar. It is a work in progess aka I’m learning how this all hangs together and will have made horrible mistakes somewhere :slight_smile:

This is the configuration I have so far with ocaml-eglot and no tuareg-mode.

(use-package ocaml-eglot
  ;; Clone of https://github.com/tarides/ocaml-eglot with minor fixes https://github.com/tarides/ocaml-eglot/pull/29
  :load-path ("~/.emacs.d/personal/modes/ocaml-eglot")
  :after ocaml-ts-mode
  :hook
  (ocaml-ts-mode . ocaml-eglot)
  (ocaml-eglot . eglot-ensure)
  :bind
  (:map ocaml-eglot-map ; Re-use sexp navigation bindings
        ("C-M-n" . ocaml-eglot-phrase-next)
        ("C-M-p" . ocaml-eglot-phrase-prev))
  :custom
                                        ; Unbind some keys in favour of flycheck and xref
  (unbind-key "C-c C-x" ocaml-eglot-map) ; Use flycheck-next-error
  (unbind-key "C-c C-c" ocaml-eglot-map) ; Use flycheck-prev-error
  (unbind-key "C-c C-l" ocaml-eglot-map) ; Use xref-find-definitions
  (unbind-key "C-c C-i" ocaml-eglot-map) ; Use xref-find-definitions
  (unbind-key "C-c C-p" ocaml-eglot-map) ; Use C-M-p
  (unbind-key "C-c C-n" ocaml-eglot-map) ; Use C-M-n
  (setq ocaml-eglot-syntax-checker 'flycheck))

;; TODO Forked OCaml Tree-sitter https://github.com/tmcgilchrist/ocaml-ts-mode/tree/fixes
(use-package ocaml-ts-mode
  :load-path ("~/.emacs.d/personal/modes/ocaml-ts-mode/"))

(use-package flycheck-eglot
  :ensure t
  :after eglot
  :config
  (global-flycheck-eglot-mode 1))

(use-package eglot
  :config
  (add-to-list 'eglot-server-programs
               '((ocaml-ts-mode :language-id "ocaml") . ("ocamllsp")))

  :hook ((ocaml-ts-mode . eglot-ensure)))

It supports some structural navigation using ocaml-eglot and the C-M-p / C-M-n keys, most of the syntax highlighting works (though it gets confused with mli files). I’m not sure how to support those properly with a single major-mode and different treesitter grammars. Proper indentation without ocamlformat or similar external tool doesn’t work at all, that is my next goal after fixing mli highlighting.

4 Likes

TreeSitter support also came up in Tuareg’s issue tracker recently. (see Use tree-sitter · Issue #305 · ocaml/tuareg · GitHub)

It seems that Jane Street are willing to fund the development of a standalone ocaml-mode using TreeSitter, or it’s integration in Tuareg. Given how pervasive Tuareg is, it seems reasonable to me to do the work there, but I know that many people would probably disagree. At any rate - it’d be nice if TreeSitter efforts for Emacs were focused and coordinated, given the small size of our community.

1 Like

FWIW I use caml-mode but I think the only features I actually I use is

  1. The syntax highligthing.
  2. The regexp error format[1].
  3. Switch view (C-c C-a, switch between .ml and .mli)

For the rest ocp-indent, merlin and compilation-mode are in charge.

So if a solid 1+2 can get into emacs itself as suggested in the linked discussion all the better. Besides I wouldn’t mind changing to tuareg or a consolidated ocaml-mode (my preference) as long as these fundamentals are there, nothing too fancy gets in the way and I can convince the syntax highligthing to get to something not too far to what I have now (I know, :older_man:t2:).

As for the reason why I’m using caml-mode, it is that in the past that’s where the support for nifty stuff would first happen (like the now defunct .annot file for showing types) and I got used too much to (my tweak of) its syntax highligthing scheme. I think I also had an aversion for how tuareg indented code, but I guess it allows ocp-indent to override nowadays.

Btw. one thing that tree sitter hackers should consider is to provide good support for ocamldoc comments and .mld files (which are ocamldoc comments). Editing them is all bland and I spend more time working on them than writing source code.


  1. Though I think it still broken in some way and have that in my .emacs at the moment. Didn’t try to remove it in the recently released 30 to see if things have improved (but I doubt). ↩︎

1 Like

I think that doing the work within the existing modes has the virtue of not fracturing the ecosystem even more. And anyway, for syntax highlighting (one of the main applications of tree-sitter), you will have to start from scratch either way.

Fully agree!

Cheers,
Nicolas

I’ll experiment with some approaches for creating a TreeSitter-based major mode, and if I get to some meaningful place I’ll decide if this is something that can be merged with tuareg-mode (it already has 2 indentation engines) or live as something standalone. I’m not an OCaml expert by any stretch, but I have quite a lot of experience with creating and maintaining Emacs extensions. (including clojure-mode, CIDER and inf-clojure) One of my main frustrations with OCaml so far is that the Emacs modes are not nearly as good as those for other languages out there, so this might be a good opportunity to play with TreeSitter and do something useful for the broader OCaml community.

I think that getting the font-locking right with TreeSitter is quite easy, and the real challenge is implementing the indentation logic.

My own requirements for the major mode are pretty modest - good font-locking and consistent indentation. I think it’d be nice to have the interop with subprocesses (e.g. ocaml and utop) live in a standalone package (e.g. inf-ocaml) instead of bundling it in the major mode as tuareg does. But different people always have different perspectives on such topics.

Do you have a sense of which proportion of users still use the mode’s indentation logic rather than a standalone tool like ocp-indent or ocamlformat? If the number is very small it may not be worth spending a lot of energy into developing a tree-sitter-based indentation engine.

Cheers,
Nicolas

No idea. Usually, it’s a good idea to run some survey to get meaningful data for how people some tool. (I was doing this recently for CIDER, which is a Clojure IDE for Emacs) But I guess it’s a fair assessment that some Emacs users would like to get things done without relying (too much) on the presence of external tools. (me included :smiley: )

FWIW I wouldn’t switch to something that is different from the configuration of ocp-indent I have been using for ages in 60+ repos – even though it still has a few inconsistent cases I’m unhappy with.

For formatting, theoretically (I never tried), if you have a tree sitter grammar you should be able to use something like topiary. I wonder if someone has developped something similar for indenting which operates slightly differently from formatting but is generally what I’m seeking for (is indenting just incremental formatting ?).

In my experience, most emacs integration with external formatting tools is triggered via after-save-hook or something similar, whereas a mode’s built-in indenting behavior is instantaneous (or at least, triggered on each carriage return).

So I think there’s an argument for having both: a very simple in-mode formatter to keep real-time edits from growing illegible, plus an external tool to apply more definitive formatting later.

Note this is not the case for ocp-indent which is both an external tool and “instantaneous”.

I would hate something that applies reformats only on save and possibly layouts the code differently from how the “simple in-mode” indenter did. Layout influences how I write the code to make it legible.

Ah, that’s interesting - the Elisp wrapper for ocp-indent just overrides the major mode’s indent-line and indent-region functions. (see ocp-indent/tools/ocp-indent.el at master · OCamlPro/ocp-indent · GitHub)

This will definitely work well with any OCaml mode based on TreeSitter, so we might be closer to a usable solution then I have previously thought.

1 Like

@bbatsov I’ve spent last week hacking on using treesitter for ocaml-ts-mode. I would prefer having a basic level of indentation available without external tools, which are often not available when hacking OCaml compiler or inconvenient when switching between OCaml versions. I haven’t used ocp-indent perhaps it is tolerant to both of these situations.

My goals are to:

  • Provide syntax highlighting for ml and mli files within the same mode.
  • Provide indentation based on treesitter grammar only
  • Extend combobulate with OCaml support for structured navigation
  • Then see whether it is suitable for integrating into tuareg or needs to be supplied separately.

The weird thing with tree-sitter-ocaml is it provides 3 different grammars. One for ocaml, one for interfaces and one for type. I’m not sure why it needs to be that way and how to reconcile that into a single major mode for Emacs, so there is a single font lock setup and indentation across ml/mli/menhir etc files. A few minor fixes for that mode #4. I’d appreciate a clarification from someone about this, I’m a bit lost in treesitter and Emacs modes documentation. Currently ocaml-ts-mode provides two major modes with different font lock setups, one for ml files and one for mli files. This breaks the assumption of eglot that each major mode maps to one LSP server.

My :rooster: scratchings on combobulate are here. Nothing major to show for that so far. Just the promise of structured navigation based on treesitter without needing a running LSP/merlin.

in the meantime, paredit works with ocaml in conjunction with tuareg. It’s not as good as what tree-sitter will offer, but decent.

The weird thing with tree-sitter-ocaml is it provides 3 different grammars. One for ocaml, one for interfaces and one for type. I’m not sure why it needs to be that way and how to reconcile that into a single major mode for Emacs, so there is a single font lock setup and indentation across ml/mli/menhir etc files. A few minor fixes for that mode #4. I’d appreciate a clarification from someone about this, I’m a bit lost in treesitter and Emacs modes documentation. Currently ocaml-ts-mode provides two major modes with different font lock setups, one for ml files and one for mli files. This breaks the assumption of eglot that each major mode maps to one LSP server.

We’ll need 3 modes - one for each grammar, but I don’t think that’s a big deal, overall, as working with the 3 grammars will be pretty similar.

I don’t plan to contribute to any of the 2 existing ocaml-ts-mode prototypes because they both seem like experiments by people who are not familiar with Emacs and haven’t seen any development after their first 5 commits. The mode you’ve referring to hasn’t been touched in 2 years for example. (not to mention it does nothing beside the font-locking)

If something can create for us a canonical repo called ocaml-mode under the OCaml org where we can contribute directly that’d be one option, otherwise I’ll start my own experimental version as a personal repo under some other name to avoid confusion with the other project.

I’ve started hacking a bit on this yesterday evening and I’ve already sketched down some ideas - e.g. I’ve noticed that structural navigation is quite easy with TreeSitter. (as is doing a limited form of go to definition in the current file, etc)

3 Likes

I’ve since forked the original GitHub - terrateamio/ocaml-ts-mode: Ocaml mode for emacs using treesitter and hacked some things into that. It’s highlighting and indenting well enough for my basic needs in ml/mli files. I’m sure it could be improved (with more angry fruit salad) and the treesitter queries tidied up.

Ideally there would be a core treesitter based mode that just does that and works on plain OCaml source, then layer on top LSP and other nice things.

2 Likes