Writing the most basic newick parser for Unity3d (C # or Actionscript)

I'm trying to figure out how to read Newick files for many animal species and I couldn't find a "boolean method / process" for sorting a Newick string in a simple programming language. I can read C # and AS and JS and GLSL and HLSL.

I can't find any simple resources and the wiki article doesn't even talk about recursion. The pseudocode how to parse newick would be so big I can't find it.

Does anyone know the fastest way to read a new file in Unity3d? Can you help me set up the correct path for the logical process to sort the new code i.e.



the number of branches is not important at the moment.

target project file:




source to share

2 answers

Parser implementation, if you have no background in formal grammars, can be tough. So the simplest approach is to use a parser generator like ANTLR and then you only need to get familiar with grammar notation, you can generate a parser written in C # from the grammar.

Fortunately, you can find the new grammar online: here .


And if you did that, you get this:

public class Branch
    public double Length { get; set; }
    public List<Branch> SubBranches { get; set; } = new List<Branch>();
public class Leaf : Branch
    public string Name { get; set; }

public class Parser
    private int currentPosition;
    private string input;

    public Parser(string text)
        input = new string(text.Where(c=>!char.IsWhiteSpace(c)).ToArray());
        currentPosition = 0;
    public Branch ParseTree()
        return new Branch { SubBranches = ParseBranchSet() };
    private List<Branch> ParseBranchSet()
        var ret = new List<Branch>();
        while (PeekCharacter() == ',')
            currentPosition++; // ','
        return ret;
    private Branch ParseBranch()
        var tree = ParseSubTree();
        currentPosition++; // ':'
        tree.Length = ParseDouble();
        return tree;
    private Branch ParseSubTree()
        if (char.IsLetter(PeekCharacter()))
            return new Leaf { Name = ParseIdentifier() };

        currentPosition++; // '('
        var branches = ParseBranchSet();
        currentPosition++; // ')'
        return new Branch { SubBranches = branches };
    private string ParseIdentifier()
        var identifer = "";
        char c;
        while ((c = PeekCharacter()) != 0 && (char.IsLetter(c) || c == '_'))
            identifer += c;
        return identifer;
    private double ParseDouble()
        var num = "";
        char c;
        while((c = PeekCharacter()) != 0 && (char.IsDigit(c) || c == '.'))
            num += c;
        return double.Parse(num, CultureInfo.InvariantCulture);
    private char PeekCharacter()
        if (currentPosition >= input.Length-1)
            return (char)0;
        return input[currentPosition + 1];


Which can be used like this:

var tree = new Parser("((A:1, B:2):3, C:4)").ParseTree();


BTW the above parser implements the following grammar without any error handling:

Tree -> "(" BranchSet ")"   
BranchSet -> Branch ("," Branch)*   
Branch -> Subtree ":" NUM
Subtree -> IDENTIFIER | "(" BranchSet ")"




Hope you are interested in converting Newick to JSON / Regular, I think I was able to find a solution.

A quick google gave me links to the implementation in JS:

And it was not difficult for me to port the JS code to AS3:

// The very funciton of converting Newick
function convertNewickToJSON(source:String):Object
    var ancestors:Array = [];
    var tree:Object = {};
    var tokens:Array = source.split(/\s*(;|\(|\)|,|:)\s*/);
    var subtree:Object;
    for (var i = 0; i < tokens.length; i++)
        var token:String = tokens[i];
        switch (token)
            case '(': // new children
                subtree = {};
                tree.children = [subtree];
                tree = subtree;

            case ',': // another branch
                subtree = {};
                tree = subtree;

            case ')': // optional name next
                tree = ancestors.pop();

            case ':': // optional length next

                var x = tokens[i-1];
                if (x == ')' || x == '(' || x == ',')
                    tree.name = token;
                } else if (x == ':')
                    tree.branch_length = parseFloat(token);

    return tree;

// Util function for parsing an object into a string
function objectToStr(obj:Object, paramsSeparator:String = "", isNeedUseSeparatorForChild:Boolean = false):String
    var str:String = "";
    if (isSimpleType(obj))
        str = String(obj);

        var childSeparator:String = "";
        if (isNeedUseSeparatorForChild)
            childSeparator = paramsSeparator;
        for (var propName:String in obj)
            if (str == "")
                str += "{ ";
                str += ", ";
            str += propName + ": " + objectToStr(obj[propName], childSeparator) + paramsSeparator;

        str += " }";

    return str;

// One more util function
function isSimpleType(obj:Object):Boolean
    var isSimple:Boolean = false;
    if (typeof(obj) == "string" || typeof(obj) == "number" || typeof(obj) == "boolean")
        isSimple = true;

    return isSimple;

var tempNewickSource:String = "((((((Falco_rusticolus:0.846772,Falco_jugger:0.846772):0.507212,(Falco_cherrug:0.802297,Falco_subniger:0.802297):0.551687):0.407358,Falco_biarmicus:1.761342):1.917030,(Falco_peregrinus:0.411352,Falco_pelegrinoides:0.411352):3.267020):2.244290,Falco_mexicanus:5.922662):1.768128,Falco_columbarius:7.69079)";
var tempNewickJSON:Object = this.convertNewickToJSON(tempNewickSource);
var tempNewickJSONText:String = objectToStr(tempNewickJSON);


The above code presents the following trace:

{ name: , children: { 0: { name: , children: { 0: { name: , children: { 0: { name: , children: { 0: { name: , children: { 0: { name: , children: { 0: { name: Falco_rusticolus, branch_length: 0.846772 }, 1: { name: Falco_jugger, branch_length: 0.846772 } }, branch_length: 0.507212 }, 1: { name: , children: { 0: { name: Falco_cherrug, branch_length: 0.802297 }, 1: { name: Falco_subniger, branch_length: 0.802297 } }, branch_length: 0.551687 } }, branch_length: 0.407358 }, 1: { name: Falco_biarmicus, branch_length: 1.761342 } }, branch_length: 1.91703 }, 1: { name: , children: { 0: { name: Falco_peregrinus, branch_length: 0.411352 }, 1: { name: Falco_pelegrinoides, branch_length: 0.411352 } }, branch_length: 3.26702 } }, branch_length: 2.24429 }, 1: { name: Falco_mexicanus, branch_length: 5.922662 } }, branch_length: 1.768128 }, 1: { name: Falco_columbarius, branch_length: 7.69079 } } }


Thus, this approach makes it possible to work with the Newick format as with JSON.

As per the title, you're not only interested in C #, but also in the AS3 implementation (I'm not sure if you can use it in C # right out of the box, but maybe you can port it to C #).



All Articles