Convert tab / space delimited strings to nested array

I would like to convert the below text to a nested array, something like what you get with the MPTT database structure.

I am getting data from a shell script and have to display it on a website. Don't control the format: /

There is a lot of information about array -> list, but not much happens the other way.

Any input would be appreciated, thanks.

cat, true cat
       => domestic cat, house cat, Felis domesticus, Felis catus
           => kitty, kitty-cat, puss
           => mouser
           => alley cat
           => tom, tomcat
               => gib
           => Angora, Angora cat
           => Siamese cat, Siamese
               => blue point Siamese
       => wildcat
           => sand cat
           => European wildcat, catamountain, Felis silvestris
           => cougar, puma, catamount, mountain lion, painter, panther, Felis concolor
           => ocelot, panther cat, Felis pardalis
           => manul, Pallas cat, Felis manul
           => lynx, catamount
               => common lynx, Lynx lynx
               => Canada lynx, Lynx canadensis



source to share

2 answers

You already have a sorted list of trees. Each next line is either a child of the previous line or a sibling. This way you can process the list, get the name of the element, get the level at which the element is indented, and create an element from it.

1 Line <=> 1 Element (level, name)


Thus, each element has a name and zero or more children. The input can also indicate to which level it belongs.

An element can be represented as an array, where the first value is the name and the second value is the array for the children.

When the list is sorted, we can use a simple map, which at the level is an alias for the children of a particular level. So, with the level that each item has, we can add it to the stack:

    $self = array($element, array());
    $stack[$level][] = &$self;
    $stack[$level + 1] = &$self[1];


As this example code shows, the stack / map for the current level gets $self

as children are added:

    $stack[$level][] = &$self;


Stack for level one above, get the reference to the children $self

(index 1


    $stack[$level + 1] = &$self[1];


So now for each line, we need to find the level. As this stack shows, the level is sequentially numbered: 0, 1, 2, ...

but there are just a few spaces in the input.

A small helper can do the job of collecting / grouping the number of characters in a line down to levels, taking care that - if the level doesn't already exist to indent - it is added, but only if higher.

This solves the problem that there is no 1: 1 relationship in your input between the indent size and the index. At least not obvious.

This helper is a model name Levels

and implements __invoke

to provide a level for indentation while transparently adding a new level as needed:

$levels = new Levels();
echo $levels(''); # 0
echo $levels('    '); # 1
echo $levels('    '); # 1
echo $levels('      '); # 2
echo $levels(' '); # Throws Exception, this is smaller than the highest one


So now we can turn the indentation into a level. This level allows us to run the stack. The stack allows you to build a tree. Good.

Parsing a string line by line can easily be done with a regular expression. Since I'm lazy, I just use preg_match_all

and return - per line - the indentation and name. Since I want more comfort, I move it into a function that always returns an array to me, so I can use it in an iterator:

$matches = function($string, $pattern)
    return preg_match_all($pattern, $string, $matches, PREG_SET_ORDER)
        ? $matches : array();


Usage in input with pattern type

/^(?:(\s*)=> )?(.*)$/m


will give me an array for each line, i.e.:

array(whole_line, indent, name)


Do you see the pattern here? It is close to

1 Line <=> 1 Element (level, name)


With an object Levels

this can be matched, so just call the display function:

function (array $match) use ($levels) {
    list(, $indent, $name) = $match;
    $level = $levels($indent);
    return array($level, $name);


From array(line, indent, name)

to array(level, name)

. To make it available, it is returned by another function, into which you can type Levels


$map = function(Levels $levels) {
    return function ...
$map = $map(new Levels());


So everything to read from all lines. However, this needs to be placed in a tree. Remembering adding to the stack:

function($level, $element) use (&$stack) {
    $self = array($element, array());
    $stack[$level][] = &$self;
    $stack[$level + 1] = &$self[1];


( $element

is the name here). It really does require a stack, and the stack is actually a tree. So let's create another function that returns this function and allows each row to be pushed onto the stack to build a tree:

$tree = array();
$stack = function(array &$tree) {
    $stack[] = &$tree;
    return function($level, $element) use (&$stack) {
        $self = array($element, array());
        $stack[$level][] = &$self;
        $stack[$level + 1] = &$self[1];
$push = $stack($tree);


So the last thing to do is just process one element after another:

foreach ($matches($input, '/^(?:(\s*)=> )?(.*)$/m') as $match) {
    list($level, $element) = $map($match);
    $push($level, $element);


So now using $input

this creates an array with only (root) child nodes on it of the first level, and then array

with two entries for each node:

array(name, children)


Name is a string here, children are an array. So this is already done technically with an array / tree. But this is quite cumbersome because you want to be able to output the tree structure. This can be done by making recursive function calls or by implementing a recursive iterator.

Let me give you an example of a recursive iterator:

class TreeIterator extends ArrayIterator implements RecursiveIterator
    private $current;

    public function __construct($node)

    public function current()
        $this->current = parent::current();
        return $this->current[0];

    public function hasChildren()
        return !empty($this->current[1]);

    public function getChildren()
        return new self($this->current[1]);


It is just an array iterator (since all nodes are an array as well as all child nodes) and for the current node it returns the name. If you are asked to provide children, he checks to see if there are others and suggests them again as TreeIterator

. This makes it simple for example. text output:

$treeIterator = new RecursiveTreeIterator(
    new TreeIterator($tree));

foreach ($treeIterator as $val) echo $val, "\n";



\-cat, true cat
  |-domestic cat, house cat, Felis domesticus, Felis catus
  | |-kitty, kitty-cat, puss
  | |-mouser
  | |-alley cat
  | |-tom, tomcat
  | | \-gib
  | |-Angora, Angora cat
  | \-Siamese cat, Siamese
  |   \-blue point Siamese
    |-sand cat
    |-European wildcat, catamountain, Felis silvestris
    |-cougar, puma, catamount, mountain lion, painter, panther, Felis concolor
    |-ocelot, panther cat, Felis pardalis
    |-manul, Pallas cat, Felis manul
    \-lynx, catamount
      |-common lynx, Lynx lynx
      \-Canada lynx, Lynx canadensis


If you are looking for more control over HTML output combined with a recursive iterator, see the following question, which provides an example of HTML output based on <ul><li>


So what does it look like all together? Code to view immediately as a github entity .



Unlike my previous answer, which is quite lengthy and explains all the steps, it can also do the same, but more succinctly.

  • Splitting strings can be done with strtok

  • preg_match

    , then an "on" line that makes the display more immanent
  • Levels

    can be compressed into an array, taking for granted that the input is correct.

This time, for output, it's a recursive function, not an iterator, that spills out a nested list <ul>

. Example code ( Demo ):

// build tree
$tree = $levels = array();
$stack[1] = &$tree;
for ($line = strtok($input, $token = "\n"); $line; $line = strtok($token)) {
    if (!preg_match('/^(?:(\s*)=> )?(.*)$/', $line, $self)) {
    $indent = array_shift($self);
    $level = @$levels[$indent] ? : $levels[$indent] = count($levels) + 1;
    $stack[$level][] = &$self;
    $stack[$level + 1] = &$self[];

// output
function tree_print(array $tree, $in = '') {
    echo "$in<ul>\n";
    $i = $in . '  ';
    foreach ($tree as $n)
        printf("</li>\n", printf("$i<li>$n[0]") && $n[1] && printf($i, printf("\n") & tree_print($n[1], "$i  ")));

    echo "$in</ul>\n";


Edit: Next comes another step to remove the tree array entirely and draw the output directly. This is a bit crazy because it mixes up the reordering of the data and the output, which is dragging on things so it's easy to change them. Also, the previous example already looks cryptic, it goes beyond good and evil ( Demo ):


function echo_list($string) {
    foreach ($m = array_map(function($v) use (&$l) {
        return array(@$l[$i = &$v[1]] ? : $l[$i] = count($l) + 1, $v[2]);
    }, preg_match_all('/^(?:(\s*)=> )?(.*)$/m', $string, $m, PREG_SET_ORDER) ? $m : array()) as $i => $v) {
        $pi = str_repeat("    ", $pl = @$m[$i - 1][0]); # prev
        $ni = str_repeat("    ", $nl = @$m[$i + 1][0]); # next
        (($v[0] - $pl) > 0) && printf("$pi<ul>\n");     # is child of prev
        echo '  ' . str_repeat("    ", $v[0] - 1), "<li>$v[1]"; # output self
        if (!$close = (($nl - $v[0]) * -1)) echo "</li>\n"; # has sibling
        else if ($close < 0) echo "\n";                     # has children
        else for (printf("</li>\n$ni" . str_repeat("    ", $close - 1) . "</ul>\n"); --$close;) # is last child
                echo $ni, $nn = str_repeat("    ", $close - 1), "  </li>\n",
                     $ni, $nn, "</ul>\n";


Resets strtok

again and comes back to the idea of ​​using preg_match_all

. Additionally, it stores all parsed rows so that you can look back and forth to determine how many items <ul>

to open or close around the current item.



All Articles