How to "rewind" when traversing a syntax tree?

olleharstedt · April 11, 2021, 3:36pm

Hi!

I have a situation when traversing my AST where I need to “rewind” and output something in the previous match case.

Case in point:

local p = new Point {1, 2};

should generate the following C-code:

Point __p = {1, 2};
Point *p = &p;

But the AST is:

Assignment (
    Struct_typ (Local, "Point"),
    "p",
    New (
        "Point",
        [
            (("x", Int), Num 1);
            (("y", Int), Num 2);
        ]
    )
);

so when matching on it, the “new” is discovered “too late”. Further more, the line can be seen as syntactic sugar for

let p = new local Point{1, 2};

I’m on the fence where to put the locality kind here…

Possible solutions:

Scan syntax tree twice to insert needed lines
Don’t make “compile_tree_to_c_code” return a string, but instead use it to put lines into a queue which can be manipulated.

Any ideas? Is there an off-the-shelf solution?

Olle

(Full code: escape-c/ast.ml at master · olleharstedt/escape-c · GitHub)

tominated · April 11, 2021, 4:43pm

While it might be far from an optimal solution, you can pattern match on nested constructors, and use a wildcard for others, like so:

match term with
| Assignment (ty, id, New (struct_name, struct_init)) -> (* do something *)
| Assignment (ty, id, expr) -> (* do another thing *)

This lets you handle the specific case up-front instead of backtracking, but it can be hard to scale to a lot of cases. I’m still learning about compilers myself so I’m not totally sure what the best solution is but I hope this is a good stopgap!

olleharstedt · April 11, 2021, 5:14pm

That doesn’t scale well. Imagine the following line:

local points = [new Point {1, 2}, new Point {2, 3}];

which should evaluate to:

Point __p = {1, 2};
Point __p2 = {2, 3};
Point* points = [&__p, &__p2];

So each time the traversal stumbles upon a “new”, it has to know which implied locality to apply. So a state in the traversal…? Or there needs to be another traversal first to insert the correct locality into the tree.

tominated · April 11, 2021, 5:57pm

Yeah it sounds like there’ll need to be some statefulness (not necessarily mutable though) - a map with bindings seems common when doing this sort of traversal in a type-checker at least (I still haven’t gotten around to codegen in my project, so I’m not all that familiar).

Perhaps your recursive function could return both the string of code, and a map of bindings it made (or something similar)? That way your leaf nodes can tell their parents that there’s a value to do stuff with.

I know this is super vague but i’m still figuring this stuff out myself

olleharstedt · April 11, 2021, 5:58pm

Oh yeah, returning something more than a string might be an interesting take…

vrotaru · April 11, 2021, 7:01pm

You can probably, just add an additional argument to you function which holds a list of visited nodes (so that you can go back, if there is need).

Also, zippers. But usually, that’s not very ergonomic

olleharstedt · April 11, 2021, 7:41pm

Or maybe put the strings into a tree that is rendered breadth first? Might give more control over the order.

Chet_Murthy · April 13, 2021, 6:59am

[I could be wildly mistaken here, but] This smells like some sort of “give names to subexpressions and then translate” problem? That is to say, you want to first do a sort of flattening of the complex expressions, so that each expression is a simple function-call, with arguments that are themselves only values or variables? Felleisen & Sabry call this “A-Translation”, and this general idea of “name nontrivial subexpressions and flatten them so that they are sequentially laid-out” is often useful as part of converting to some simpler language. You could implement this by iteratively rewriting your program, e.g.

local t = new Tree { new Tree { new Tree { new Leaf { 1}, new Leaf {2} }, 
                                                  new Leaf { 3} } ,
                                new Leaf { 4 } }

The outermost rewrite gives

local t1 = new Tree { new Tree { new Leaf { 1}, new Leaf {2} }, new Leaf { 3} }
local t2 = new Leaf { 4 }
local t = new Tree { t1 , t2 }

and then the next rewrite start at the earliest term that was modified, giving

local t3 = new Tree { new Leaf { 1}, new Leaf {2} }
local t4 = new Leaf { 3}
local t1 = new Tree { t3, t4 }
local t2 = new Leaf { 4 }
local t = new Tree { t1 , t2 }

and so on. The idea that your language should be capable of naming all intermediate expressions (or you’ll need to add that capability), and then you do all the rewriting in-language, only converting to C after you’ve got things in a sufficiently explicit form.

olleharstedt · April 14, 2021, 10:02pm

Did some testing, and feel like a second pass would be easiest. This is a toy example for a syntax tree with only two nodes: assignment and stack_alloc:

type stmt = Assignment | Stack_alloc

let rec insert_stack_alloc (stmts : stmt list) = match stmts with
    | [] -> []
    (** TODO: Needs guard to check for New expressions, and a function to
              fetch all those expression *)
    | Assignment::tail -> Stack_alloc :: Assignment :: insert_stack_alloc tail
    | Stack_alloc::_ -> failwith "Shouldn't visit Stack_alloc in this pass"

ivg · April 29, 2021, 9:09pm

Instead of rewinding just do not emit prematurely The common technique, used in AST folding, is maintaining a stack for keeping a partially ready example and emitting the code once it is ready. The stack, basically, is a reverse polish notation (RPN) of your currently visited expression. This is a common technique used in uplifting from lower to higher-level languages, e.g., in decompilers and other lifters.

olleharstedt · April 30, 2021, 8:26am

Cool, thanks! Didn’t know. A stack makes sense.

Topic		Replies	Views
How to correctly make a selection of all locations in a structure that match a pattern (memoization)? Learning	3	616	December 3, 2018
Tail cascade: a new indentation style for some OCaml constructs Learning	10	2289	November 26, 2019
AST Visitor Designs Learning	2	1333	May 17, 2018
AST tracking of type variables in polymorphic let-bindings (an old question) Ecosystem type-system	2	837	June 15, 2021
Tag the AST after each pass to enforce a specific order Learning announce , language-design	0	577	April 17, 2021

How to "rewind" when traversing a syntax tree?

Related topics