Good references on the theory of parsing

This is a little off-topic, but I don’t really know where to ask this question: I’m looking for good (comprehensive) references on the theory of parsing. My current specific question is on the various reductions between LL(k) and LR(k) languages, but really, I’d like to get a comprehensive study of all the various classes of CFGs and context-free languages, and their relations, including algorithms for reducing from one class to another.

I’m digging around, but surely there is a canonical text to be had in this area (it’s so old).

Anybody got any favorites?

2 Likes

Here are the official lecture notes from a Formal Languages class I took in college that might cover some of what you’re looking for! Basically everything in the class is in these lecture notes, they are really comprehensive. Best of luck!

1 Like

Thanks for the pointer. This looks like the class I took back in college, with the HU book (the one with the girl on a swing on the cover). I’m looking for … the next level up, with all the rest of the story … the stuff that typically isn’t taught in the undergrad course, b/c … well, it’s not very useful.

1 Like

What does that mean? Isn’t the point of language classes that they don’t overlap entirely?

There are some inclusions between classes:

1 Like

Sorry, I’m probably speaking inaccurately. There are grammar-classes LL(k), LR(k). And from what I understand, there are reduction algorithms that allow to take LR(k) and reduce to LR(1), and similarly from LL(k) to LR(k) (LR(1) ?) In any case, all of this stuff is … ancient, so the presentations can be (ahem) quite archaic. I’m looking for the most accessible presentation of these reductions and the various classes, parsing algorithms, etc.

To be sure, it’s not to actually implement these things: I’m hacking away on an ANTLR clone, and I’m perfectly happy to keep going. But I’m just getting interested in understanding the area better – purely for edification, not for any practical use.

P.S. I figured I’d ask only b/c there might be some really well-loved book out there. Absent that, I’m just digging thru the papers: this stuff is, after all, almost as old as most of us, if not older.

2 Likes

You might like Aho & Ullman’s book “The theory of parsing, translation, and compiling”, part one of which is all about parsing: https://dl.acm.org/doi/pdf/10.5555/578789.

I don’t know too much about this book but have used parts of it to supplement their famous and well-loved “Dragon Book”, which is great but doesn’t go into quite as much detail as you’re looking for wrt language/parsing theory.

2 Likes

Oh, nice! Thank you, this covers the other grammar classes; I’ll have to see whether they cover the algorithms to convert between various classes, but this is at least a good start!

1 Like

I recommend, as a general reference: “Parsing Techniques: A Practical Guide” by Grune and Jacobs. It’s encyclopedic, but unfortunately incomplete in certain areas like practical algorithms for LR(1) automaton construction.

Aho & Ullman is a good start but it’s a bit old. It doesn’t (for example) include any of the 1970s era techniques for practical LR(1) parsing, or any of the more modern updates to them. (That said, as I mentioned, Grune and Jacobs doesn’t include that either!)

My own experience is that if you want to implement an of the algorithms in the Dragon Book, you will find, surprisingly, that there can be big gaps in the exposition that can lead to days of frustration when you realize vital corner cases aren’t explained. For example, if you attempt to implement the LR(1) automaton construction algorithm in the Dragon book, you will discover that it doesn’t work for recursive grammars, which is to say, all grammars of any interest. You will then find that loads of people online will handwave and say that the extension to recursive grammars is trivial and not worth discussing; perhaps it was trivial for them, but figuring it out for myself took several days which a couple of paragraphs of additional exposition might have solved in minutes.

7 Likes