Neocaml-mode (A modern Emacs major mode for OCaml) is looking for testers

Hey everyone,

Just wanted to let you know I’ve spent a bit of time polishing neocaml and I think now it’s ready to be used (or at least tested) by more people. The font-locking and indentation are more or less done, and there’s also a basic integration with OCaml toplevels. Coupled with something like ocaml-eglot the existing functionality should get you pretty far.

You’ll still have to install it from the GitHub repo (you’ll find detailed instructions there), but I’ve also opened a MELPA recipe PR, so I hope the installation process will become simpler soon.

Feel free to share feedback and feature requests here and over at GitHub!

P.S. I know the name is a bit controversial (some people said it evokes nvim vibes), and the down the road I may just rename it to ocaml-mode or try to merge it with Tuareg. Naming remains hard…

10 Likes

This looks interesting, thanks!

I’m an old merlin user and I haven’t seriously considered switching to ocaml-eglot yet. Do you know if your mode works well with merlin on top? (I use (add-hook 'tuareg-mode-hook 'merlin-mode t) in my configuration file, is there a similar hook in neocaml?)

Skimming the source I have some naive questions (I’ve never looked at how tree-sitter-derived language modes are implemented before):

  • why do you need a list of keywords, isn’t this already provided by the tree-sitter grammar?
  • I wonder why the :feature 'type definition needs to list basically all type-forming grammar rules, this seems redundant with the grammar

I’m also clinging on old but trusty technology (merlin + caml-mode) but what would make me seriously consider shaking up that setup is support for syntax highlighting in .mld files. When I edit long documents and cookbooks I tend to get lost and a bit of color sign posting would help.

I don’t think neocaml-mode does that or does it ?

Adding support for .mld files should be pretty simple, provided there’s already an existing TreeSitter grammar for those. I guess I’ll have to check for this.

I’m hoping that neocaml will become the go-to replacement for caml-mode one day, but that will take a bit of extra work and a lot more users (so there’s more guidance for the development).

That’s a limitation of the Emacs TreeSitter API - it doesn’t support the .scm query files directly, so some things have to be repeated in Elisp queries. Probably this will change down the road, but for now we have to do some extra work. For the same reason the indent queries are in Elisp as well, even if the grammar provides some already.

I wonder why the :feature 'type definition needs to list basically all type-forming grammar rules, this seems redundant with the grammar

Pretty much the same as above. Also keep in mind that as grammar definitions vary wildly it’s not very easy to map them to the constructs in Emacs automatically. Although I think there’s some work done in Emacs 31, at least for some form of font-locking auto-derived from the scm files.

I’m an old merlin user and I haven’t seriously considered switching to ocaml-eglot yet. Do you know if your mode works well with merlin on top? (I use (add-hook 'tuareg-mode-hook 'merlin-mode t) in my configuration file, is there a similar hook in neocaml?)

Yeah, there’s neocaml-mode-hook that you can use.

A couple of small updates:

2 Likes

Regarding your comments about investing time for the REPL mode. I have never been convinced by the experience provided by current modes (see this message for some reasons) I think that if you want nice REPL driven development with OCaml what you want is a proof general like experience which @Gopiandcode explored here.

Thanks for putting it on melpa this greatly increases the chances that I give it a try on a procrastination day :–)

1 Like

Btw. does it install a regexp for compilation mode (which just FIY is invoked with C-c C-c in caml-mode ;–) ?

To this day I still have this in my init.el to get correct selections which I would gladly get rid of (the linked issue is closed but the problem remains AFAIK) :

;; Will we get to something at some point ?
;; https://github.com/ocaml/caml-mode/issues/5

(defun caml--begin-column ()
  (and (match-beginning 6)
       (+ (string-to-number (match-string 6)) 1)))

(defun caml--end-column ()
  (and (match-beginning 7)
       (string-to-number (match-string 7))))

(when (boundp 'compilation-error-regexp-alist-alist)
  (push `(ocaml ,caml--error-regexp 3 (4 . 5)
                (caml--begin-column . caml--end-column) (8) 1
                (8 font-lock-function-name-face))
        compilation-error-regexp-alist-alist))

(when (boundp 'compilation-error-regexp-alist)
  (push 'ocaml compilation-error-regexp-alist))

Yeah, now it does. After seeing your comment I’ve adapted the logic from tuareg to neocaml, so I think we are solid on this front. More details - Add OCaml compilation error regexp for M-x compile by bbatsov · Pull Request #4 · bbatsov/neocaml · GitHub

1 Like

Thanks!

Btw. I always thought it would be nice to upstream that to compilation mode itself. I routinely use the format outside OCaml to dump AST representations (see e.g. cmarkit locs or jsont locs if you have cmarkit or jsont installed) instead of the GNU error format whose default regexp does not match when the location print out is indented.

But for that I would likely be good that upstream acts on this issue to have a proper reference to mention.

1 Like

Yeah, it’d be definitely nice to upstream this and it should be easy enough. I’m just not super fond of the “patches on mailing list” contribution workflow for Emacs, but I guess I can stomach it for a good cause sometime in the future. :grin:

2 Likes

So I gave it a quick try but reverted back (for now!).

I don’t really have the time to investigate as I’m actively trying to prevent myself from losing a day to review my emacs setup but:

  1. It seems the compilation mode regexp is not working for my (non-ocaml) use cases. Is it installed globally once neocaml has loaded ?
  2. I’d need to be able to customize the syntax highlighting scheme. Is that possible? (emacs being emacs I guess it is, I’m rather asking on what is the idiomatic way of doing this)
  3. Though it’s mentioned in the docs it is possible to use ocp-indent, it was not immediately clear to me how, could you perhaps add the appropriate snippet to the docs?
1 Like

In that other thread, the crux of the issue seems to be:

I wondered if you’ve given org-babel a try. It’s somewhat similar to writing code blocks in MDX, in the sense that you can have a toplevel session that spans blocks (or not, if you prefer).

A simple example:

.emacs settings:

(org-babel-do-load-languages
 'org-babel-load-languages
 '((ocaml . t)))

; better ergonomics, but only for trusted code blocks!
(setq org-confirm-babel-evaluate nil)

babel-example.org:

#+begin_src ocaml                                                                                                                                                                                                                                                        
  let x = 3                                                                                                                                                                                                                                                            
#+end_src                                                                                                                                                                                                                                                                

#+RESULTS:
: 3                                                                                                                                                                                                                                                                      

#+begin_src ocaml                                                                                                                                                                                                                                                        
  let y = x + 1                                                                                                                                                                                                                                                          
#+end_src                                                                                                                                                                                                                                                                

#+RESULTS:
: 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

The initial version of this buffer contained only the #+begin_src...#+end_src blocks, and the results were generated automatically and sequentially in response to C-c C-v C-b. It may not be desirable that later calls to this command reuse the session, but there are workarounds for that.

No. But that’s not really what I’m looking for, I’m not interested in writing proto-ocaml and getting REPL responses in my sources, I’m interested in writing regular .ml and .mli files, that are gradually (re)evaluated along with their dependencies into a consistent REPL environment.

1 Like

Thanks for giving it a try! Let me address your points:

1. Compilation mode regexp – Yes, neocaml installs the OCaml compilation regexp globally (replacing the ocaml entry in compilation-error-regexp-alist-alist), which is what you were asking about in a previous comment. It should only match OCaml-shaped error messages (File "...", line N, characters N-N:), so it shouldn’t interfere with other compilers. Could you share an example of the non-OCaml compilation output that’s getting matched incorrectly? That would help me tighten the regexp if needed.

2. Syntax highlighting customization – Absolutely. neocaml uses standard tree-sitter font-lock, so there are several levels of customization:

You can control how much gets highlighted via treesit-font-lock-level (1-4, default 3):

;; Minimal highlighting (only comments and definitions)
(setq treesit-font-lock-level 1)

;; Maximum highlighting (everything neocaml supports)
(setq treesit-font-lock-level 4)

You can also remap faces for specific syntactic constructs. For example, if you want type names to use a different face:

(custom-set-faces
 '(font-lock-type-face ((t (:foreground "DarkSeaGreen4")))))

Or for more granular control, you can selectively enable/disable font-lock features with M-x treesit-font-lock-recompute-features. The available features are: comment, definition, keyword, string, number, attribute, builtin, constant, type, operator, bracket, delimiter, variable, function.

3. ocp-indent setup – Fair point, I should have included a snippet. Here’s what you need:

(defun my-neocaml-mode-setup ()
  (setq-local indent-line-function #'ocp-indent-line)
  (setq-local indent-region-function #'ocp-indent-region))

(add-hook 'neocaml-base-mode-hook #'my-neocaml-mode-setup)

I’ll add this to the README.

1 Like

Thanks for your answers.

For 1. I opened an issue.

Just FYI, regarding my need for 2 I quickly tried the different font-lock-level modes but nothing was really suitable and it was actually impeding my ability to read the code.

It’s likely personal, my brain is too used to my current (slightly tweaked caml-mode) scheme after years of use but I still think that there are things in the default that don’t really make sense in a language like OCaml: in my book functions are values so I find it distracting to have a different color for the let bound f in these two cases:

let f = M.f
let f x = x

But YMMV.

Yeah, I totally get it. For what is worth - the defaults for the font-locking are mostly derived from the official grammar font-locking queries (see tree-sitter-ocaml/queries/highlights.scm at master · tree-sitter/tree-sitter-ocaml · GitHub). As those font-locking queries are used automatically by most editors that support Tree-sitter, I wanted us not to deviate from them too much in the interest of some degree of consistency across the OCaml community.

The 4 font-locking levels follow the official Emacs recommendations for the same reason. I understand this will not be ideal for everyone, but I think sticking to the standards is not a bad thing overall. (and, of course, Emacs allows to customize everything with which we don’t really agree)

So at that point only point 2. remains an issue for me for switching. I have been trying a bit to tweak that but I’m not sure I’m going anywhere with the provided toggles, the problem is less in the colors (which are easy to swap around) than in some particular distinctions that don’t seem to be made.

I’m not particularly into fruit salads (and the emacs theme I’m using has not a separate color for every font-lock-* face anyways) but I’m basically used to these distinctions being made. For example, I’d like to distinguish: “block” keywords, named infix operators, and labels (which are undistinguished from values in neocaml).

What’s your take on that ? Should I simply take the time to define my own treesit-font-lock-rules ?

To give an idea I’m roughly looking to transform this (level 4) :


into:

(The colors don’t matter, I’m happy to swap them around, but the syntactic distinctions do)

All three distinctions are doable with the current setup. Here’s a breakdown and an example config:

Labels are the easiest – neocaml already highlights label_name with font-lock-property-use-face at level 4. The issue is likely that your theme renders it identically to font-lock-variable-use-face. You can either remap the face or pick a theme that distinguishes them.

Block keywords and named infix operators aren’t distinguished out of the box (all keywords share font-lock-keyword-face, all operators share font-lock-operator-face), but you can add custom treesit-font-lock-rules via a hook to split them out:

;; 1. Distinguish "block" keywords (begin/end, struct/sig, etc.)
(defface my-ocaml-block-keyword-face
  '((t :inherit font-lock-keyword-face :weight bold))
  "Face for OCaml block-delimiting keywords.")

(defun my-neocaml-add-block-keywords ()
  (setq treesit-font-lock-settings
        (append treesit-font-lock-settings
                (treesit-font-lock-rules
                 :language (treesit-parser-language
                            (car (treesit-parser-list)))
                 :override t
                 :feature 'keyword
                 '(["begin" "end" "struct" "sig" "object"
                    "do" "done" "fun" "function"]
                   @my-ocaml-block-keyword-face)))))

;; 2. Distinguish named infix operators (lor, land, mod, etc.)
(defface my-ocaml-named-operator-face
  '((t :inherit font-lock-keyword-face :slant italic))
  "Face for OCaml named infix operators.")

(defun my-neocaml-add-named-operators ()
  (let ((lang (treesit-parser-language (car (treesit-parser-list)))))
    (setq treesit-font-lock-settings
          (append treesit-font-lock-settings
                  (treesit-font-lock-rules
                   :language lang
                   :override t
                   :feature 'operator
                   `(((infix_expression
                       operator: _ @my-ocaml-named-operator-face)
                      (:match ,(rx bos (or "lor" "land" "lxor"
                                           "asr" "lsl" "lsr" "mod")
                                   eos)
                              @my-ocaml-named-operator-face))))))))

;; 3. Make labels visually distinct (if your theme doesn't already)
(custom-set-faces
 '(font-lock-property-use-face ((t (:foreground "DarkOrange3")))))

;; Hook it all together
(defun my-neocaml-font-lock-setup ()
  (my-neocaml-add-block-keywords)
  (my-neocaml-add-named-operators)
  (treesit-font-lock-recompute-features))

(add-hook 'neocaml-base-mode-hook #'my-neocaml-font-lock-setup)

The key idea is that treesit-font-lock-rules with :override t lets you layer additional rules on top of neocaml’s defaults. The rules use the same Tree-sitter query syntax, so you can target any node type or pattern. treesit-font-lock-recompute-features at the end ensures everything takes effect.

Controlling what gets highlighted: If you want less highlighting overall (e.g. labels and operators but not variables or brackets), you don’t have to use the level system. You can explicitly select which features are active:

;; Instead of setting treesit-font-lock-level, cherry-pick features:
(treesit-font-lock-recompute-features
 '(comment definition keyword string number attribute builtin
   constant type operator variable)  ; enable these
 '(bracket delimiter function))       ; disable these

The available features in neocaml are: comment, definition, keyword, string, number, attribute, builtin, constant, type, operator, bracket, delimiter, variable, function. This gives you fine-grained control – you can have operators and labels highlighted without bracket and delimiter noise.

1 Like

The issue is likely that your theme renders it identically to font-lock-variable-use-face . You can either remap the face or pick a theme that distinguishes them

According to list-faces-display that seems to be case (the culprit is color-theme-sanityinc-solarized).

Thanks for the rest I’ll try that at some point.

That being said my idea for moving to neocaml was rather to have less stuff in my init.el :–)

Even if mapping them to the same font-lock-* face by default would you be open to integrate some of these upstream ? I think the one size fits all of font-lock-* is a bit limiting. Some of those distinction made by caml-mode are actually useful for the programmer to make it easier to delineate and visually hop around the AST.

For example the distinction between named infix operators and logical operators which are coinductive of precedence (see the screenshot above). Or the block level keywords allow to easily delineate the blocks of conditionals (in then begin, end else, the begin and end are distinguished from then and else).