Create nested tree dict from array of dicts with children
I have a dicts array retrieved from a web API. Each has a dict name
, description
, 'parent' and children
. The key children
has an array of dicts values as its value. For clarity, here's a dummy example:
[
{'name': 'top_parent', 'description': None, 'parent': None,
'children': [{'name': 'child_one'},
{'name': 'child_two'}]},
{'name': 'child_one', 'description': None, 'parent': 'top_parent',
'children': []},
{'name': 'child_two', 'description': None, 'parent': 'top_parent',
'children': [{'name': 'grand_child'}]},
{'name': 'grand_child', 'description': None, 'parent': 'child_two',
'children': []}
]
Each element in the array. The element can be the topmost parent and therefore does not exist in any of the arrays children
. An element can be both child and parent. Or the element can only be a child (it has no children of its own).
So, in a tree structure, you would have something like this:
top_parent
child_one
child_two
grand_child
In this contrived and simplified example, it top_parent
is the parent but not the child; child_one
- this is a child, but not a parent; child_two
- parent and child; and grand_child
is a child, but not a parent. This covers all possible states.
I want to be able to iterate over the dicts array 1 time and generate a nested dict that correctly represents the tree structure (however this is not possible 1 time, in the most efficient way). So, in this example, I would get a voice recorder that looks like this:
{
'top_parent': {
'child_one': {},
'child_two': {
'grand_child': {}
}
}
}
Strictly speaking, there is no need to have an element without children in order not to be keys, but this is preferable.
source to share
The fourth edit showing three versions is slightly cleaned up. The first version works from top to bottom and returns None as you asked, but essentially looping through the top-level array 3 times. The next version only projects it once, but returns empty dicts instead of None.
The final version works from the bottom up and is very clean. It can return empty dicts with one loop, or None with an extra loop:
from collections import defaultdict
my_array = [
{'name': 'top_parent', 'description': None, 'parent': None,
'children': [{'name': 'child_one'},
{'name': 'child_two'}]},
{'name': 'child_one', 'description': None, 'parent': 'top_parent',
'children': []},
{'name': 'child_two', 'description': None, 'parent': 'top_parent',
'children': [{'name': 'grand_child'}]},
{'name': 'grand_child', 'description': None, 'parent': 'child_two',
'children': []}
]
def build_nest_None(my_array):
childmap = [(d['name'], set(x['name'] for x in d['children']) or None)
for d in my_array]
all_dicts = dict((name, kids and {}) for (name, kids) in childmap)
results = all_dicts.copy()
for (name, kids) in ((x, y) for x, y in childmap if y is not None):
all_dicts[name].update((kid, results.pop(kid)) for kid in kids)
return results
def build_nest_empty(my_array):
all_children = set()
all_dicts = defaultdict(dict)
for d in my_array:
children = set(x['name'] for x in d['children'])
all_dicts[d['name']].update((x, all_dicts[x]) for x in children)
all_children.update(children)
top_name, = set(all_dicts) - all_children
return {top_name: all_dicts[top_name]}
def build_bottom_up(my_array, use_None=False):
all_dicts = defaultdict(dict)
for d in my_array:
name = d['name']
all_dicts[d['parent']][name] = all_dicts[name]
if use_None:
for d in all_dicts.values():
for x, y in d.items():
if not y:
d[x] = None
return all_dicts[None]
print(build_nest_None(my_array))
print(build_nest_empty(my_array))
print(build_bottom_up(my_array, True))
print(build_bottom_up(my_array))
Results in:
{'top_parent': {'child_one': None, 'child_two': {'grand_child': None}}}
{'top_parent': {'child_one': {}, 'child_two': {'grand_child': {}}}}
{'top_parent': {'child_one': None, 'child_two': {'grand_child': None}}}
{'top_parent': {'child_one': {}, 'child_two': {'grand_child': {}}}}
source to share
You can store lazy matching from names to nodes and then rebuild the hierarchy by processing only the link parent
(I assume the data is correct, so if A
marked as parent B
iff B
is listed among children A
).
nmap = {}
for n in nodes:
name = n["name"]
parent = n["parent"]
try:
# Was this node built before?
me = nmap[name]
except KeyError:
# No... create it now
if n["children"]:
nmap[name] = me = {}
else:
me = None
if parent:
try:
nmap[parent][name] = me
except KeyError:
# My parent will follow later
nmap[parent] = {name: me}
else:
root = me
The children
input property is only used to know if the element should be stored as None
in its parent (since it has no children), or if it should be a dictionary because it will have children at the end of the rebuild process. Storing childless nodes as empty dictionaries would simplify the code a bit by avoiding the need for this special case.
Using collections.defaultdict
, the code can also be simplified to create new nodes
import collections
nmap = collections.defaultdict(dict)
for n in nodes:
name = n["name"]
parent = n["parent"]
me = nmap[name]
if parent:
nmap[parent][name] = me
else:
root = me
This algorithm O(N)
assumes constant dictionary access and makes only one pass on input and requires a O(N)
space for the name -> node map (space requirement O(Nc)
for the original nochildren->None
version, where Nc
is the number of nodes with children).
source to share
My hit on it:
persons = [\
{'name': 'top_parent', 'description': None, 'parent': None,\
'children': [{'name': 'child_one'},\
{'name': 'child_two'}]},\
{'name': 'grand_child', 'description': None, 'parent': 'child_two',\
'children': []},\
{'name': 'child_two', 'description': None, 'parent': 'top_parent',\
'children': [{'name': 'grand_child'}]},\
{'name': 'child_one', 'description': None, 'parent': 'top_parent',\
'children': []},\
]
def findParent(name,parent,tree,found = False):
if tree == {}:
return False
if parent in tree:
tree[parent][name] = {}
return True
else:
for p in tree:
found = findParent(name,parent,tree[p],False) or found
return found
tree = {}
outOfOrder = []
for person in persons:
if person['parent'] == None:
tree[person['name']] = {}
else:
if not findParent(person['name'],person['parent'],tree):
outOfOrder.append(person)
for person in outOfOrder:
if not findParent(person['name'],person['parent'],tree):
print 'parent of ' + person['name'] + ' not found
print tree
leads to:
{'top_parent': {'child_two': {'grand_child': {}}, 'child_one': {}}}
It also picks up all children whose parent hasn't been added yet and then reconciles that at the end.
source to share