Python - create hierarchy file

Considering the following unordered tab delimited file:

Asia    Srilanka
Srilanka    Colombo
Continents  Europe
India   Mumbai
India   Pune
Continents  Asia
Earth   Continents
Asia    India

      

The goal is to generate the following output (tab-delimited):

Earth   Continents  Asia    India   Mumbai
Earth   Continents  Asia    India   Pune
Earth   Continents  Asia    Srilanka    Colombo
Earth   Continents  Europe

      

I created the following script to accomplish the goal:

root={} # this hash will finally contain the ROOT member from which all the nodes emanate
link={} # this is to hold the grouping of immediate children 
for line in f:
    line=line.rstrip('\r\n')
    line=line.strip()
    cols=list(line.split('\t'))
    parent=cols[0]
    child=cols[1]
    if not parent in link:
        root[parent]=1
    if child in root:
        del root[child]
    if not child in link:
        link[child]={}
    if not parent in link:
        link[parent]={}
    link[parent][child]=1

      

Now I intend to print the desired output using the two previously created dict (root and link). I'm not sure how to do this in python. But I know that we could write the following in perl to achieve the result:

print_links($_) for sort keys %root;

sub print_links
{
  my @path = @_;

  my %children = %{$link{$path[-1]}};
  if (%children)
  {
    print_links(@path, $_) for sort keys %children;
  } 
  else 
  {
    say join "\t", @path;
  }
}

      

Could you please help me to achieve the desired result in python 3.x?

+3


source to share


1 answer


I see the following problems:

  • reading relations from a file;
  • hierarchy of building from relations.
  • writing the hierarchy to a file.

Assuming that the height of the hierarchy tree is less than the default recursion limit (in most cases, this is 1000

), let's define the utility functions for these individual tasks.

Utilities

  • Parsing relationships can be done using

    def parse_relations(lines):
        relations = {}
        splitted_lines = (line.split() for line in lines)
        for parent, child in splitted_lines:
            relations.setdefault(parent, []).append(child)
        return relations
    
          

  • The building hierarchy can be done with

    • Python> = 3.5

      def flatten_hierarchy(relations, parent='Earth'):
          try:
              children = relations[parent]
              for child in children:
                  sub_hierarchy = flatten_hierarchy(relations, child)
                  for element in sub_hierarchy:
                      try:
                          yield (parent, *element)
                      except TypeError:
                          # we've tried to unpack `None` value,
                          # it means that no successors left
                          yield (parent, child)
          except KeyError:
              # we've reached end of hierarchy
              yield None
      
            

    • Python <3.5 : advanced iterative unpacking added since PEP-448 , but it could be replaced with itertools.chain

      e.g.

      import itertools
      
      
      def flatten_hierarchy(relations, parent='Earth'):
          try:
              children = relations[parent]
              for child in children:
                  sub_hierarchy = flatten_hierarchy(relations, child)
                  for element in sub_hierarchy:
                      try:
                          yield tuple(itertools.chain([parent], element))
                      except TypeError:
                          # we've tried to unpack `None` value,
                          # it means that no successors left
                          yield (parent, child)
          except KeyError:
              # we've reached end of hierarchy
              yield None
      
            

  • Exporting the hierarchy to a file can be done with

    def write_hierarchy(hierarchy, path, delimiter='\t'):
        with open(path, mode='w') as file:
            for row in hierarchy:
                file.write(delimiter.join(row) + '\n')
    
          

Using

Assuming the file path is 'relations.txt'

:

with open('relations.txt') as file:
    relations = parse_relations(file)

      

gives us



>>> relations
{'Asia': ['Srilanka', 'India'],
 'Srilanka': ['Colombo'],
 'Continents': ['Europe', 'Asia'],
 'India': ['Mumbai', 'Pune'],
 'Earth': ['Continents']}

      

and our hierarchy

>>> list(flatten_hierarchy(relations))
[('Earth', 'Continents', 'Europe'),
 ('Earth', 'Continents', 'Asia', 'Srilanka', 'Colombo'),
 ('Earth', 'Continents', 'Asia', 'India', 'Mumbai'),
 ('Earth', 'Continents', 'Asia', 'India', 'Pune')]

      

finally export it to a file named 'hierarchy.txt'

:

>>> write_hierarchy(sorted(hierarchy), 'hierarchy.txt')

      

(we use sorted

to get the hierarchy as in the desired output file)

PS

If you are not familiar with Python

generators , we can define a function flatten_hierarchy

like

  • Python> = 3.5

    def flatten_hierarchy(relations, parent='Earth'):
        try:
            children = relations[parent]
        except KeyError:
            # we've reached end of hierarchy
            return None
        result = []
        for child in children:
            sub_hierarchy = flatten_hierarchy(relations, child)
            try:
                for element in sub_hierarchy:
                    result.append((parent, *element))
            except TypeError:
                # we've tried to iterate through `None` value,
                # it means that no successors left
                result.append((parent, child))
        return result
    
          

  • Python <3.5

    import itertools
    
    
    def flatten_hierarchy(relations, parent='Earth'):
        try:
            children = relations[parent]
        except KeyError:
            # we've reached end of hierarchy
            return None
        result = []
        for child in children:
            sub_hierarchy = flatten_hierarchy(relations, child)
            try:
                for element in sub_hierarchy:
                    result.append(tuple(itertools.chain([parent], element)))
            except TypeError:
                # we've tried to iterate through `None` value,
                # it means that no successors left
                result.append((parent, child))
        return result
    
          

+3


source







All Articles