Annotating nested ADTs in Haskell with additional data

While writing a compiler in Haskell, I ran into a specific problem a couple of times now while working with nested data types. I will have an ADT, something like

data AST = AST [GlobalDecl]

data GlobalDecl = Func Type Identifier [Stmt] | ...

data Stmt = Assign Identifier Exp | ...

data Exp = Var Identifier | ...


By doing some conversion to AST, I can briefly describe some additional data with variables that are used in the expression. All the options I've looked at so far for this seem pretty awkward. I can create a new datatype:

data Exp' = Var' Identifier ExtraInfo | ...


but it means that to create a slightly modified AST'

I need new definitions Stmt'

, GDecl'

. Another option is to add another data constructor to the original one Exp

, but use it only in that specific part of the program:

data Exp = Var Identifier | Var' Identifier ExtraInfo | ...


If you do this, the typechecker program can no longer prevent you from being mistakenly used Var'

in any other part of the program. A third option would be to simply store additional information permanently, even though it is not relevant to the rest of the program:

data Exp = Var Identifier ExtraInfo | ...


Okay, but it's ugly, especially if you need more information for a short while. At this point, I've just added some additional information to Map Indentifier ExtraInfo

and transferred it using AST

, either explicitly or via the state monad. This can get inconvenient quickly if, for example, you need to annotate different events of the same one Identifier

with different information.

Does anyone have any elegant methods for annotating nested data types?


source to share

2 answers

If you make all your types more polymorphic like:

data AST a = AST a
data GlobalDecl t i s = Func t i [s] | ...
data Stmt i e = Assign i e | ...
data Exp a = Var a | ...


then you can create them temporarily with a tuple - eg. Exp (Int, Identifier)

- for intermediate calculations. If necessary, you can do newtype

s for the specific types above for convenience.



One way to mark a structure with additional data is to use a higher type of type. If you only need to bind variables, you can do this for example.

data AST f = AST [GlobalDecl f]
data GlobalDecl f = Func Type Identifier [Stmt f] | ...
data Stmt = Assign Identifier (Exp f) | ...
data Exp f = Var (f Identifier) | ...


This is similar to what Peter suggested, but instead of making the types completely generic, it only parameterizes the part you want to change.

You will get your original, unlabeled structure with, AST Identity

or you can have a type type AST ((,) ExtraInfo)

that will turn Var (f Identifier)

into Var (ExtraInfo, Identifier)


If you need to tag each AST level with some additional information (like token positions), you can even define the data type as

data AST f = AST [f (GlobalDecl f)]
data GlobalDecl f = Func (f (Type f)) (f (Identifier f)) [f (Stmt f)] | ...
data Stmt f = Assign (f (Identifier f)) (f (Exp f)) | ...
data Exp f = Var (f (Identifier f)) | ...


AST ((,) ExtraInfo)

Will now contain additional information at each branch point in the syntax tree (granted, working with the specified structure will be a little cumbersome).



All Articles