Camlp4r : using a correct end of phrase (single semicolon) in OCaml REPL

I’m currently working with some code using camlp4r.

When I use the Emacs buffer with Tuareg mode, and I call ocaml (Run OCaml REPL or Evaluate phrase), it’s ok when using ordinary/normal OCaml syntax (directive #camlp4o).
However, when using the revised syntax (directive #camlp4r), which accepts only one semi-colon for ending a phrase, there is a confusion because the REPL is still expecting a double semicolon to evaluate the phrase. And if I give it one, there is of course a parse error:
Parse error: ';' expected after [str_item] (in [phrase])
It’s the same error if the phrase is sent from the buffer or directly written in the toplevel. So there is no problem with passing the phrase but with this unexpected “end of phrase” lexem in the Toplevel.

It works perfectly when writing the same phrase in a basic CLI Toplevel ($ ocaml then directive #camlp4r).

I’m surprised because #camlp4o and #camlp4r are ordinary topfind directives.

How can I fix that?

I’m not really aware of all the toplevel machinery. I just know that there are toploop.ml topdirs.ml topmain.ml files in ocaml/toplevel that could be related to that.
I had a look at tuareg.el that handles shortcuts (e.g. shift return, ctrl return to insert “;;”) and seems to send the phrase to the REPL:

(defun tuareg-interactive-send-input ()
  "Send the current phrase to the OCaml REPL or insert a newline.
If the point is next to \";;\", the phrase is sent to the REPL,
otherwise a newline is inserted and the lines are indented."

But it’s still unclear how I could change/fix this (while keeping the behaviour for normal/ordinary syntax). In fact, I don’t know which component is responsible for handling that.

Thanks.

Luc, I don’t have an answer, but at least a little idea on how to debug (maybe):

Maybe try running your ocaml toplevel inside a “M-x shell” buffer? If, in there, your camlp4r misbehaves, then a culprit could be comint. But I don’t expect it to misbehave. So, if it behaves correctly in a terminal window, and in a shell buffer, but misbehaves in a tuareg-mode interactive toplevel, then I’d go looking at what tuareg actually sends ocaml. A way to do this without working too hard, would be to write yourself a little shellscript that invokes ocaml, but does so inside a “script” command – which will log all inputs and outputs.

Then get tuareg to fire up that shellscript instead of the ocaml command, and after you repro the misbehaviour, you can exit the toplevel and inspect the log in the “typescript” file.

Just an idea.

Hi, Chet,
OCaml revised syntax is perfectly supported via M-x shell (then ocaml command, then #camlp4r directive).
If I intentionally put ;; instead of ; at the end of the phrase, then I get the same error message:
Error:Parse error: [semi] expected after [str_item] (in [phrase])

Tuareg appears to be the component that adds an incorrect “end of phrase” lexem before sending the phrase to the REPL:

TEST done within the *OCaml* buffer:

1/ Write a value definition in revised syntax, with a valid SINGLE semicolon:
# value a = 1;

2/ Do M-x comint-send-input
That perfectly sends the phrase to the REPL that returns its evaluation:
value a : int = 1

I had a look at tuareg.el , especially at the Tuareg interactive mode section.
In tuareg.el , I replaced all “end of phrase” ;; with ; (so buffer phrases can be correctly recognized), but this ugly hack didn’t work.

The function tuareg-interactive-send-input-end-of-phrase is bound to shift return (and to ctrl return).
One thing is still weird: after I replaced all the ;;with ; , shift return still adds a double semicolon then sends the phase to the Toplevel.
How can you explain that?

Hi,
Tuareg does not support the revised syntax. sending a double semicolon is not factored out (so as of now cannot be readily changed, you have to redefine tuareg-interactive-send-input-end-of-phrase) but, more importantly, the differences in syntax mean that some phrases will be incorrectly detected. The revised syntax has been long deprecated and, now, the entire camlp4 system is.

Hi , Chris,

I’ve done that in tuareg-interactive-send-input-end-of-phrase, but it’s not enough:

(defun tuareg-interactive-send-input-end-of-phrase ()
  (interactive)
  (goto-char (point-max))
  ;; (unless (equal ";;" (save-excursion (caddr (smie-backward-sexp))))
  ;; (insert ";;"))  
  ;; ugly hack to get tuareg send a single semicolon to the REPL
  (unless (equal ";" (save-excursion (caddr (smie-backward-sexp))))
    (insert ";"))
  (comint-send-input))

So, I’ve even changed all double ;; with single ; (to allow a correct phase management) but shift return still adds a ;; and sends the whole phase to the REPL that fails to evaluate.
Can you explain where that could come from? Maybe an Emacs standand input behaviour ?
Thanks.

PS: I should be able to handle legacy camlp4r code.

Are you sure you reevaluated the code (C-x C-e) or recompiled it? It works on my system.

I just do C-x C-e while being in the phrase (usually at the beginning), or I go in the OCaml buffer and do shift return to send the phrase to the Toplevel.
Only a pure M-x comint-send-input works correctly.

Regarding the modified tuareg.el : as I thought that all is interpreted in Emacs, I didn’t recompile it.
I tried to to byte-recompile-file and byte-recompile-directory, which broke tuareg mode (maybe too many ;; changed ). So I took a fresh tuareg.el and just changed the following:

(defun tuareg-interactive-send-input-end-of-phrase ()
  (interactive)
  (goto-char (point-max))
  ;; (unless (equal ";;" (save-excursion (caddr (smie-backward-sexp))))
    ;; (insert ";;"))
  (unless (equal ";" (save-excursion (caddr (smie-backward-sexp))))
    (insert ";"))
  (comint-send-input))

But it still adds a double semicolon.
Can you tell me which change you made to tuareg.el or whatever that works on your system?
It looks like an Elisp/Emacs issue (related to my limited Elisp knowledge).

I just do C-x C-e while being in the phrase (usually at the beginning), or I go in the OCaml buffer and do shift return to send the phrase to the Toplevel. Only a pure M-x comint-send-input works correctly.

I understood your aim as evaluating phrases in the toplevel (aka REPL). Evaluating phrases in a .ml buffer (C-x C-e) will likely not work as the phrase will be incorrectly detected.

Regarding the modified tuareg.el : as I thought that all is interpreted in Emacs, I didn’t recompile it.

It depends how you installed Tuareg. Through opam or Emacs package interface, files are byte compiled — so changing the source has no effect.

I tried to to byte-recompile-file and byte-recompile-directory, which broke tuareg mode (maybe too many ;; changed ). So I took a fresh tuareg.el and just changed the following:

(defun tuareg-interactive-send-input-end-of-phrase ()
(interactive)
(goto-char (point-max))
;; (unless (equal “;;” (save-excursion (caddr (smie-backward-sexp))))
;; (insert “;;”))
(unless (equal “;” (save-excursion (caddr (smie-backward-sexp))))
(insert “;”))
(comint-send-input))

But it still adds a double semicolon.

In an elisp buffer, you can evaluate a phrase by placing the cursor (point) at the end of it and type C-x C-e. Redefining this function will only affect “shift return” in the toplevel, not the evaluation of phrases in a buffer.

Best,
C.

P.S. Frankly, I do not think the matter is worth all that trouble. Just convert you files to the regular syntax (camlp4 can do that for you) and continue from there. This will also ensure your code stays compilable with newer versions of the compiler (that camlp4 may not support).

I understood your aim as evaluating phrases in the toplevel (aka REPL). Evaluating phrases in a .ml buffer (C-x C-e) will likely not work as the phrase will be incorrectly detected.

That’s why I changed all the ;; with ; in tuareg.el. But I may have done errors.
; also represents a sequence:
ordinary OCaml syntax e1; e2; e3; e4
revised OCaml unambiguous syntax do { e1; e2; e3; e4 }

At least, when in the OCaml buffer, shift return should just add a single semicolon before calling comint-send-input (doing M-x comint-send-input when in the OCaml buffer sends the phrase in revised OCaml syntax which is correctly evaluated).
I don’t understand why tuareg-interactive-send-input-end-of-phrase is still adding ;; after this change (;; replaced with ;)

I think that this topic may interest all people who aim at understanding and customizing their OCaml text editor/IDE. And of course the OCaml programming language itself.

Regarding using camlp4o/camlp4r or not:
you should be talking about the “new and now dead/deprecated camlp4” (from 2007- OCaml 3.10). I’m talking about the “legacy and genuine camlp4” which name is campl5. camlp5 works pretty well, is used by coq and is perfectly maintained (7.10 released 2019-08-25 for OCaml 4.10.0).
@mjambon explains clearly “the difference between camlp4 and camlp4” in his very nice camlp5 tutorial https://mjambon.github.io/mjambon2016/extend-ocaml-syntax.html
afaik (this is now my personal opinion), camlp5 is reliable and is much more powerful than the recent and intentionally limited ppx system which must now surprisingly be redesigned because “every OCaml release can potentially break your code” (see The future of ppx).

Note, that camlp4 is deprecated and discontinued.

Yes, the “new (in 2007) and now dead/deprecated camlp4” is no longer supported.
But camlp5 is supported (camlp5 is the new name for the “legacy and genuine camlp4”).

Have you any idea for fixing this Tuareg/Emacs issue?
It’s quite a common issue when we deal with OCaml dialects (MetaOCaml or any OCaml variants).

EDIT:
This topic could have been named:
camlp4r/camlp5r : using a correct end of phrase (single semicolon) in OCaml REPL

Camlp5 is supported, but was problematic, until I complained on this discussion board - see the camlp5 and 4.08 saga. So, while technically it is still alive and maintained, it is not updated very often. Thus many projects, including Coq, ditched the dependency. Now only a handful of the projects depend on it:

That is far from enough. This change is just dealing with adding a final ;; if there were none. Phrase detection is based on SMIE which is relies on a (approximate) representation of OCaml grammar. So, supporting a different syntax means changing this too. This can be done but someone has to step in to lay down a proposal and then do the work.

In tuareg.el I just replaced ;; with ; and also replaced on caddr with cadr (in tuareg-interactive-send-input and tuareg-interactive-send-input-end-of-phrase) and recompiled.
Now, when I’m in the OCaml buffer, a single semicolon is correctly added and camlp5r evaluates the legal phrase it receives in OCaml revised syntax.
But if there is only one phrase in the buffer, C-x C-e still adds a double semicolon to the phrase under the point (which has only one semicolon in revised syntax).
And if there are several phrases, Tuareg is running until it hits a double semicolon it’s looking for, sending all encountered invalid phrases to the Toplevel
I need to look deeper at tuareg.el .

Yes, I discovered that situation while studying tuareg.el . SMIE (“Simple Minded” Indentation Engine) is far from being wonderful.
And now that’s exactly what I’m facing: SMIE approximately analyzes the buffer, looking for patterns and for double semicolons.

BTW, this may partly explain why we have been forced for years to put those ugly double semicolons in a program to be able to evaluate some parts in the Toplevel (personally, that’s why I mainly evaluate selected regions).
A syntax analyzer with a complete knowledge of OCaml syntax does not need to get a double semicolon to recognize the end of a phrase.
For example, a new let foo = ... is enough to indicate the end of the previous phrase.

One valid reason I can see for putting a ;; at the end of an OCaml phrase is to help the compiler evaluating the region marked this way, so it can give more helpful focused and error messages.

The existing “hacks” in tuareg.el seem related to some ambiguities of the OCaml ordinary/normal syntax (compared to OCaml revised syntax):

Ex : tuareg.el Version: 2.2.0
L1202 (let and value)
(defconst tuareg-smie-grammar
;; Problems:
;; - “let D in E” expression vs “let D” declaration. This is solved
;; by making the lexer return “d-let” for the second case.

L 1739 ( | for contructorr, match…with, function, …)
(defun tuareg-smie-rules (kind token)
;; FIXME: Handling of “= |”, “with |”, “function |”, and “[ |” is
;; problematic.

The (very slightly) revised OCaml syntax makes a clear difference between let and let..in which AST representation are very different (let is a structure item, and let...in is an expression).
(* ordinary/normal OCaml syntax *)
let a = 1;;
let a = 1 in let b = 2 in a + b;;

(* (slightly) revised OCaml syntax *)
value a = 1;
let a = 1 in let b = 2 in a + b;

Of course, OCaml revised syntax is very rarely used by the OCaml community (despite it is explicitly unambiguous, without begin..end, implicit else and unclear sequences ;. And that property is very helpful when coming to cleanly deal with parsers and extensible grammars, without hacks).

I suppose that it could be of interest for the whole community to improve this tuareg extension.
But I’ve never really heard people complaining about that and asking for that improvement.
I see that you are one of the authors of tuareg. And you may hear frequent requests.

What do you think about improving tuareg for OCaml syntax?

This is incorrect. I personally barely use semicolons in my programs. Yet Tuareg tries to send the phrase around the point to the toplevel — and this is why it requires knowledge of the OCaml grammar.

I think I already answered that point above : this can be done but someone has to step in to lay down a proposal and then do the work. The interference with the current Tuareg code must be light — or the new code must be in a separate file (which the author commit to maintain in the long term).