Create nested tree dict from array of dicts with children

I have a dicts array retrieved from a web API. Each has a dict name

, description

, 'parent' and children

. The key children

has an array of dicts values ​​as its value. For clarity, here's a dummy example:

[
  {'name': 'top_parent', 'description': None, 'parent': None,
   'children': [{'name': 'child_one'},
                {'name': 'child_two'}]},
  {'name': 'child_one', 'description': None, 'parent': 'top_parent',
   'children': []},
  {'name': 'child_two', 'description': None, 'parent': 'top_parent',
   'children': [{'name': 'grand_child'}]},
  {'name': 'grand_child', 'description': None, 'parent': 'child_two',
   'children': []}
]

      

Each element in the array. The element can be the topmost parent and therefore does not exist in any of the arrays children

. An element can be both child and parent. Or the element can only be a child (it has no children of its own).

So, in a tree structure, you would have something like this:

top_parent
  child_one
  child_two
    grand_child

      

In this contrived and simplified example, it top_parent

is the parent but not the child; child_one

- this is a child, but not a parent; child_two

- parent and child; and grand_child

is a child, but not a parent. This covers all possible states.

I want to be able to iterate over the dicts array 1 time and generate a nested dict that correctly represents the tree structure (however this is not possible 1 time, in the most efficient way). So, in this example, I would get a voice recorder that looks like this:

{
  'top_parent': {
    'child_one': {},
    'child_two': {
      'grand_child': {}
    }
  }    
}

      

Strictly speaking, there is no need to have an element without children in order not to be keys, but this is preferable.

+3


source to share


3 answers


The fourth edit showing three versions is slightly cleaned up. The first version works from top to bottom and returns None as you asked, but essentially looping through the top-level array 3 times. The next version only projects it once, but returns empty dicts instead of None.

The final version works from the bottom up and is very clean. It can return empty dicts with one loop, or None with an extra loop:

from collections import defaultdict

my_array = [
  {'name': 'top_parent', 'description': None, 'parent': None,
   'children': [{'name': 'child_one'},
                {'name': 'child_two'}]},
  {'name': 'child_one', 'description': None, 'parent': 'top_parent',
   'children': []},
  {'name': 'child_two', 'description': None, 'parent': 'top_parent',
   'children': [{'name': 'grand_child'}]},
  {'name': 'grand_child', 'description': None, 'parent': 'child_two',
   'children': []}
]

def build_nest_None(my_array):
    childmap = [(d['name'], set(x['name'] for x in d['children']) or None)
                for d in my_array]
    all_dicts = dict((name, kids and {}) for (name, kids) in childmap)
    results = all_dicts.copy()
    for (name, kids) in ((x, y) for x, y in childmap if y is not None):
        all_dicts[name].update((kid, results.pop(kid)) for kid in kids)
    return results

def build_nest_empty(my_array):
    all_children = set()
    all_dicts = defaultdict(dict)
    for d in my_array:
        children = set(x['name'] for x in d['children'])
        all_dicts[d['name']].update((x, all_dicts[x]) for x in children)
        all_children.update(children)
    top_name, = set(all_dicts) - all_children
    return {top_name: all_dicts[top_name]}


def build_bottom_up(my_array, use_None=False):
    all_dicts = defaultdict(dict)
    for d in my_array:
        name = d['name']
        all_dicts[d['parent']][name] = all_dicts[name]

    if use_None:
        for d in all_dicts.values():
            for x, y in d.items():
                if not y:
                    d[x] = None

    return all_dicts[None]

print(build_nest_None(my_array))
print(build_nest_empty(my_array))
print(build_bottom_up(my_array, True))
print(build_bottom_up(my_array))

      



Results in:

{'top_parent': {'child_one': None, 'child_two': {'grand_child': None}}}
{'top_parent': {'child_one': {}, 'child_two': {'grand_child': {}}}}
{'top_parent': {'child_one': None, 'child_two': {'grand_child': None}}}
{'top_parent': {'child_one': {}, 'child_two': {'grand_child': {}}}}

      

+2


source


You can store lazy matching from names to nodes and then rebuild the hierarchy by processing only the link parent

(I assume the data is correct, so if A

marked as parent B

iff B

is listed among children A

).

nmap = {}
for n in nodes:
    name = n["name"]
    parent = n["parent"]
    try:
        # Was this node built before?
        me = nmap[name]
    except KeyError:
        # No... create it now
        if n["children"]:
            nmap[name] = me = {}
        else:
            me = None
    if parent:
        try:
            nmap[parent][name] = me
        except KeyError:
            # My parent will follow later
            nmap[parent] = {name: me}
    else:
        root = me

      

The children

input property is only used to know if the element should be stored as None

in its parent (since it has no children), or if it should be a dictionary because it will have children at the end of the rebuild process. Storing childless nodes as empty dictionaries would simplify the code a bit by avoiding the need for this special case.



Using collections.defaultdict

, the code can also be simplified to create new nodes

import collections
nmap = collections.defaultdict(dict)
for n in nodes:
    name = n["name"]
    parent = n["parent"]
    me = nmap[name]
    if parent:
        nmap[parent][name] = me
    else:
        root = me

      

This algorithm O(N)

assumes constant dictionary access and makes only one pass on input and requires a O(N)

space for the name -> node map (space requirement O(Nc)

for the original nochildren->None

version, where Nc

is the number of nodes with children).

+1


source


My hit on it:

persons = [\
  {'name': 'top_parent', 'description': None, 'parent': None,\
   'children': [{'name': 'child_one'},\
                {'name': 'child_two'}]},\
  {'name': 'grand_child', 'description': None, 'parent': 'child_two',\
   'children': []},\
  {'name': 'child_two', 'description': None, 'parent': 'top_parent',\
   'children': [{'name': 'grand_child'}]},\
  {'name': 'child_one', 'description': None, 'parent': 'top_parent',\
   'children': []},\
]

def findParent(name,parent,tree,found = False):
    if tree == {}:
        return False
    if parent in tree:
        tree[parent][name] = {}
        return True
    else:
        for p in tree:
            found = findParent(name,parent,tree[p],False) or found
        return found

tree = {}
outOfOrder = []
for person in persons:
    if person['parent'] == None:
        tree[person['name']] = {}
    else:
        if not findParent(person['name'],person['parent'],tree):
            outOfOrder.append(person)
for person in outOfOrder:
    if not findParent(person['name'],person['parent'],tree):
        print 'parent of ' + person['name']  + ' not found
print tree

      

leads to:

{'top_parent': {'child_two': {'grand_child': {}}, 'child_one': {}}}

      

It also picks up all children whose parent hasn't been added yet and then reconciles that at the end.

0


source







All Articles