Set up consolidation to merge and align tree structure
I have a set of data like this:
data = { 1: {"root": [2],
"leaf": [10, 11, 12],
},
2: {"root": [1,3],
"leaf": [13, 14, 15],
},
3: { "root": [2],
"leaf": [16, 17],
},
4: {"root": [],
"leaf": [17, 18, 19],
},
5: { "root": [],
"leaf": [20, 21]
},
}
From this data, the original key is the root index node, it contains a dictionary explaining which root and leaf nodes are associated with it.
I want to combine all indexes into linked lists.
- Root index linked by root index, both / all root indexes and all index indexes are concatenated into the resulting list.
- A root index can be linked to another root through a leaf, root indices and all leaf indices are merged into the resulting list.
I am having trouble finding the best way to traverse and combine the data. From the above data Expected output :
[[1, 2, 3, 4, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [5, 20, 21]]
The fixed try seems to work, is there a better method?
class MergeMachine(object):
processed = []
def merge(self, idx, parent_indexes, existing):
if idx not in self.processed:
parent_indexes.append(idx)
if self.data[idx]["root"]:
for related_root_idx in self.data[idx]["root"]:
if related_root_idx not in self.processed and related_root_idx not in parent_indexes:
existing.extend(self.merge(related_root_idx, parent_indexes, existing))
self.processed.append(related_root_idx)
existing.append(idx)
existing.extend(self.data[idx]["leaf"])
self.processed.append(idx)
return existing
def process(self, data):
results = []
self.data = data
for root_idx in self.data.keys():
r = set(self.merge(root_idx, [], []))
if r:
combined = False
for result_set in results:
if not r.isdisjoint(result_set):
result_set.union(r)
combined = True
if not combined:
results.append(r)
return results
mm = MergeMachine()
mm.process(data)
Is there an efficient way to combine the data into the expected result?
source to share
I have no idea if it is efficient, but it works:
data = #your data as posted
data = [set ( [k] ) | set (v ['root'] ) | set (v ['leaf'] ) for k, v in data.items () ]
merged = []
while data:
e0 = data [0]
for idx, e in enumerate (data [1:] ):
if e0 & e:
data [idx + 1] = e | e0 #idx is off by 1 as I enumerate data [1:]
break
else: merged.append (e0)
data = data [1:]
print (merged)
My guess is that in the worst case (i.e. no possible merge) the cost should be O (n ** 2). And it is serially without recursion.
source to share
I came up with this which is similar to but not quite the same as above. Mine is destructive, it consumes the input data structure and I think it is limited to one point (On ^ 2 in case none of the input data is bound).
def merge(data):
result = []
while data:
k, v = data.popitem()
temp = set([k]) | set(v['root']) | set(v['leaf'])
for idx, test in enumerate(result):
if test & temp:
result[idx] |= temp
break
else:
result.append(temp)
return result
source to share