FAQ
Hi all --

in the spirit of diving right into the deep end, I'm learning Go by
writing an interpreter for a simple language. A simplified subset of
the grammar:

statement:
assignment EOL

assignment:
NAME '=' expr

expr:
STRING
expr + expr
(In reality, I broke expr up into multiple productions to avoid
ambiguities. I said this was a *simplified* subset!)

This parser would accept

a = "foo"
b = "bar"
c = "foo" + "bar"

Simple enough. I'm pretty sure the output of the parser should be an
AST (abstract syntax tree), but I'm unsure if I've got the details
right. Here's what it looks like:

type ASTNode interface {
Dump(writer io.Writer, indent string)
Equal(other ASTNode) bool
}

type ASTExpression interface {
// this mainly exists because I feel I need a way to distinguish
// expression nodes... and implementing String() is easy and useful
String() string
}

// implements ASTNode
type ASTAssignment struct {
target string // variable name
expr ASTExpression
}

// implements ASTNode and ASTExpression
type ASTString struct {
value string
}

// implements ASTNode and ASTExpression
type ASTAdd struct {
op1 ASTExpression
op2 ASTExpression
}

The implementations of Dump(), Equal(), and String() for those types
is straightforward and a bit repetitive, so I'll spare you. I'm sure
I'll have to add ASTNode.Walk() eventually.

This seemed like the most obvious, straightforward design to me. It's
also quite similar to a prototype I threw together in Python. But now
that I'm getting into the real world of AST walking, I sense a lot of
type assertions in my future, which seems to miss the point of having
interfaces like ASTNode and ASTExpression.

The only other design that I can see is one big concrete mudball:

const (
nt_root = iota
nt_stmt
nt_assign
nt_string
nt_add
)

type ASTNode struct {
nodetype int
children []ASTNode
data string
}

If nodetype == nt_root, then children is a list of statement nodes. If
nodetype == nt_add, then children is {op1, op2}. Etc. AST walking is
then one recursive function with a big switch on nodetype. This seems
pretty vile to me, and it only occurred to me after reading about AST
walking articles by a couple of ANTLR hackers (specifically
http://www.antlr.org/article/1170602723163/treewalkers.html).

Thoughts? Is my approximately-one-type-per-grammar-rule the usual
approach? Is there a nice middle ground that I'm missing?

Thanks --

Greg

--

Search Discussions

  • Rob Pike at Nov 1, 2012 at 12:39 am
    Embedding can help a lot here. Look for instance at how
    text/template/parse/node.go defines its types, or the use of
    commonType in the Type implementations in reflect/type.go, which is
    not an AST but has similar issues.

    -rob

    --
  • Jim Teeuwen at Nov 1, 2012 at 1:21 am
    The embedding approach usually works for me. Although it depends on how
    complex the parser has to be.
    Here are two examples I did.

    A very simple parser, without embedding, as it was not necessary.
    It parses knitting patterns:

    * https://github.com/jteeuwen/knit

    A more complex one which does use type embedding to alleviate some of the
    repetitive boilerplate.
    Notably the nodeBase type which is embedded by all AST node types.
    It parses an assembly language with some custom additions:

    * https://github.com/jteeuwen/dcpu/tree/master/parser


    Note that these (specially the second one) are far from ideal. But they get
    the job done.

    On Thursday, November 1, 2012 1:05:30 AM UTC+1, Greg Ward wrote:

    Hi all --

    in the spirit of diving right into the deep end, I'm learning Go by
    writing an interpreter for a simple language. A simplified subset of
    the grammar:

    statement:
    assignment EOL

    assignment:
    NAME '=' expr

    expr:
    STRING
    expr + expr
    (In reality, I broke expr up into multiple productions to avoid
    ambiguities. I said this was a *simplified* subset!)

    This parser would accept

    a = "foo"
    b = "bar"
    c = "foo" + "bar"

    Simple enough. I'm pretty sure the output of the parser should be an
    AST (abstract syntax tree), but I'm unsure if I've got the details
    right. Here's what it looks like:

    type ASTNode interface {
    Dump(writer io.Writer, indent string)
    Equal(other ASTNode) bool
    }

    type ASTExpression interface {
    // this mainly exists because I feel I need a way to distinguish
    // expression nodes... and implementing String() is easy and useful
    String() string
    }

    // implements ASTNode
    type ASTAssignment struct {
    target string // variable name
    expr ASTExpression
    }

    // implements ASTNode and ASTExpression
    type ASTString struct {
    value string
    }

    // implements ASTNode and ASTExpression
    type ASTAdd struct {
    op1 ASTExpression
    op2 ASTExpression
    }

    The implementations of Dump(), Equal(), and String() for those types
    is straightforward and a bit repetitive, so I'll spare you. I'm sure
    I'll have to add ASTNode.Walk() eventually.

    This seemed like the most obvious, straightforward design to me. It's
    also quite similar to a prototype I threw together in Python. But now
    that I'm getting into the real world of AST walking, I sense a lot of
    type assertions in my future, which seems to miss the point of having
    interfaces like ASTNode and ASTExpression.

    The only other design that I can see is one big concrete mudball:

    const (
    nt_root = iota
    nt_stmt
    nt_assign
    nt_string
    nt_add
    )

    type ASTNode struct {
    nodetype int
    children []ASTNode
    data string
    }

    If nodetype == nt_root, then children is a list of statement nodes. If
    nodetype == nt_add, then children is {op1, op2}. Etc. AST walking is
    then one recursive function with a big switch on nodetype. This seems
    pretty vile to me, and it only occurred to me after reading about AST
    walking articles by a couple of ANTLR hackers (specifically
    http://www.antlr.org/article/1170602723163/treewalkers.html).

    Thoughts? Is my approximately-one-type-per-grammar-rule the usual
    approach? Is there a nice middle ground that I'm missing?

    Thanks --

    Greg
    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedNov 1, '12 at 12:05a
activeNov 1, '12 at 1:21a
posts3
users3
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase