Python - create hierarchy file
Considering the following unordered tab delimited file:
Asia Srilanka
Srilanka Colombo
Continents Europe
India Mumbai
India Pune
Continents Asia
Earth Continents
Asia India
The goal is to generate the following output (tab-delimited):
Earth Continents Asia India Mumbai
Earth Continents Asia India Pune
Earth Continents Asia Srilanka Colombo
Earth Continents Europe
I created the following script to accomplish the goal:
root={} # this hash will finally contain the ROOT member from which all the nodes emanate
link={} # this is to hold the grouping of immediate children
for line in f:
line=line.rstrip('\r\n')
line=line.strip()
cols=list(line.split('\t'))
parent=cols[0]
child=cols[1]
if not parent in link:
root[parent]=1
if child in root:
del root[child]
if not child in link:
link[child]={}
if not parent in link:
link[parent]={}
link[parent][child]=1
Now I intend to print the desired output using the two previously created dict (root and link). I'm not sure how to do this in python. But I know that we could write the following in perl to achieve the result:
print_links($_) for sort keys %root;
sub print_links
{
my @path = @_;
my %children = %{$link{$path[-1]}};
if (%children)
{
print_links(@path, $_) for sort keys %children;
}
else
{
say join "\t", @path;
}
}
Could you please help me to achieve the desired result in python 3.x?
source to share
I see the following problems:
- reading relations from a file;
- hierarchy of building from relations.
- writing the hierarchy to a file.
Assuming that the height of the hierarchy tree is less than the default recursion limit (in most cases, this is 1000
), let's define the utility functions for these individual tasks.
Utilities
-
Parsing relationships can be done using
def parse_relations(lines): relations = {} splitted_lines = (line.split() for line in lines) for parent, child in splitted_lines: relations.setdefault(parent, []).append(child) return relations
-
The building hierarchy can be done with
-
Python> = 3.5
def flatten_hierarchy(relations, parent='Earth'): try: children = relations[parent] for child in children: sub_hierarchy = flatten_hierarchy(relations, child) for element in sub_hierarchy: try: yield (parent, *element) except TypeError: # we've tried to unpack `None` value, # it means that no successors left yield (parent, child) except KeyError: # we've reached end of hierarchy yield None
-
Python <3.5 : advanced iterative unpacking added since PEP-448 , but it could be replaced with
itertools.chain
e.g.import itertools def flatten_hierarchy(relations, parent='Earth'): try: children = relations[parent] for child in children: sub_hierarchy = flatten_hierarchy(relations, child) for element in sub_hierarchy: try: yield tuple(itertools.chain([parent], element)) except TypeError: # we've tried to unpack `None` value, # it means that no successors left yield (parent, child) except KeyError: # we've reached end of hierarchy yield None
-
-
Exporting the hierarchy to a file can be done with
def write_hierarchy(hierarchy, path, delimiter='\t'): with open(path, mode='w') as file: for row in hierarchy: file.write(delimiter.join(row) + '\n')
Using
Assuming the file path is 'relations.txt'
:
with open('relations.txt') as file:
relations = parse_relations(file)
gives us
>>> relations
{'Asia': ['Srilanka', 'India'],
'Srilanka': ['Colombo'],
'Continents': ['Europe', 'Asia'],
'India': ['Mumbai', 'Pune'],
'Earth': ['Continents']}
and our hierarchy
>>> list(flatten_hierarchy(relations))
[('Earth', 'Continents', 'Europe'),
('Earth', 'Continents', 'Asia', 'Srilanka', 'Colombo'),
('Earth', 'Continents', 'Asia', 'India', 'Mumbai'),
('Earth', 'Continents', 'Asia', 'India', 'Pune')]
finally export it to a file named 'hierarchy.txt'
:
>>> write_hierarchy(sorted(hierarchy), 'hierarchy.txt')
(we use sorted
to get the hierarchy as in the desired output file)
PS
If you are not familiar with Python
generators , we can define a function flatten_hierarchy
like
-
Python> = 3.5
def flatten_hierarchy(relations, parent='Earth'): try: children = relations[parent] except KeyError: # we've reached end of hierarchy return None result = [] for child in children: sub_hierarchy = flatten_hierarchy(relations, child) try: for element in sub_hierarchy: result.append((parent, *element)) except TypeError: # we've tried to iterate through `None` value, # it means that no successors left result.append((parent, child)) return result
-
Python <3.5
import itertools def flatten_hierarchy(relations, parent='Earth'): try: children = relations[parent] except KeyError: # we've reached end of hierarchy return None result = [] for child in children: sub_hierarchy = flatten_hierarchy(relations, child) try: for element in sub_hierarchy: result.append(tuple(itertools.chain([parent], element))) except TypeError: # we've tried to iterate through `None` value, # it means that no successors left result.append((parent, child)) return result
source to share