Parsing question - keywords

patrick-nicodemus · September 25, 2024, 1:54pm

This is not an OCaml question, it is a design question about writing a parser grammar, but there are many people in the OCaml community who write parsers so I hope this is an appropriate question.

I forked someone’s tree-sitter grammar for Org, a markup language. The language has some commands relating to calendar events, for example

SCHEDULED: <2024-09-28 Sun>
DEADLINE: <2024-10-05 Sun>
CLOSED: <2024-10-05 Sun>

These three keywords are part of the language documentation.

The grammar I was looking at parses this and returns a pair (keyword, date) where keyword is the string before the colon and date is the date. The parser accepts any string of alphabetical characters to be the keyword.

As far as I’m concerned, at some point in the business logic I have to do a case analysis on whether the keyword is SCHEDULED, DEADLINE or CLOSED, as these have different semantic meanings of course. I can either put that logic in the code myself or modify the parser to do this. I have been following the rule of thumb that “Any logic that can go in the parser, is by definition parsing logic, so it should go in the parser.” So I want to move as much functionality as possible into the tree-sitter grammar itself. In this case this would take the form of adding specific constant keywords “SCHEDULED”, “DEADLINE” and “CLOSED” to the grammar, and having the data structure returned by the parser distinguish between these three at the level of node types.

Is “Any logic that can go in the parser should go in the parser” a good rule of thumb, or is this a mistake?

I note that in this particular case,

Instead of modifying the grammar I could just define three different parser queries, each one with the particular string hardcoded into it, and then use those queries in many places
the culture of “hackability” around Emacs encourages extensibility and flexibility, in this case letting the keyword be an arbitrary alphabetical string allows users to augment the language with additional keywords as they see fit

Topic		Replies	Views
Looking for an OCaml parser in ANTLR or JFlex Ecosystem	6	1865	April 16, 2021
Neocaml - a TreeSitter-powered Emacs package for OCaml programming Ecosystem emacs , editor	9	396	March 20, 2025
Combinaml.0.1 released - a customizable parser combinator library Community announce	1	535	July 11, 2023
Sanity check on grammar for tree language Learning menhir , parser , ocamlyacc , grammar	1	310	December 25, 2023
My way from LALR parsing to combinator parsing Community	6	2730	March 12, 2021

Parsing question - keywords

Related topics