Python: convert subset of list to dictionary by matching specific elements

I have a list in Python, each element of which is one German word, for example:

my_list = [..., 'Stahl ', 'Stahl ', 'Die ', '*die ', 'Rheinhausener ', 'Rhein=Hausener ', 'Mittelstreckenraketen', 'Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete',...]

      

In this list, compound nouns immediately follow their possible decompositions / partitions (there can be an arbitrary number of partitions / partitions)

eg. "Mittelstreckenraketen" has 3 expansions / partitions:

'Mittel = Strecken = Rakete', 'Mittel = strecken = Rakete', 'Mittels = trecken = Rakete'

whereas "Rheinhausener" only has one:

'Rhein = Hausener'

The list contains approximately 50,000 items.

What I would like to do is extract only compound nouns and their decompositions / splits (discarding all other elements in the list) and read them into a dictionary with the compound noun as the key and the decomposition / splitting as values, e.g .:

my_dict = {...,'Rheinhausener ': ['Rhein=Hausener '], 'Mittelstreckenraketen': ['Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete'],...}

      

Thus, discarding items such as:

'Stahl', 'Stahl', 'Die', '* die'

I was thinking about going through the list and every time an element appears with one or more equals = =, taking the previous element and storing it as a key. But I'm too much a Python newbie to figure out how to account for an arbitrary number of values ​​for each dictionary entry; so i appreciate any help.

+3


source to share


2 answers


Here's one way to do it using defaultdict. Defaultdict automatically creates an empty list if we try to access a key that doesn't exist.

#!/usr/bin/env python

from collections import defaultdict

my_list = [
    'Stahl ',
    'Stahl ',
    'Die ',
    '*die ',
    'Rheinhausener ',
    'Rhein=Hausener ',
    'Mittelstreckenraketen',
    'Mittel=Strecken=Rakete',
    'Mittel=strecken=Rakete',
    'Mittels=trecken=Rakete'
]

my_dict = defaultdict(list)

key = None
for word in my_list:
    if '=' in word:
        if key is None:
            print 'Error: No key found for', word
        my_dict[key].append(word)
    else:
        key = word

for key in my_dict:
    print '%r: %r' % (key, my_dict[key])

      

Output



'Rheinhausener ': ['Rhein=Hausener ']
'Mittelstreckenraketen': ['Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete']

      

Note that this code will not work correctly unless the key element is immediately preceded by a series of expansions.

+2


source


You can use defaultdict:

from collections import defaultdict

my_list = ['Stahl ', 'Stahl ', 'Die ', '*die ', 'Rheinhausener ', 'Rhein=Hausener ', 'Mittelstreckenraketen', 'Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete']

my_dict = defaultdict(list)

value = ''
for item in my_list:
  if '=' not in item:
    value = item
  else:
    my_dict[value].append(item)

print my_dict

      

which prints



defaultdict(<type 'list'>, {'Rheinhausener ': ['Rhein=Hausener '], 'Mittelstreckenraketen': ['Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete']})

      

It takes the last element it saw without the '=' character, which is the word we are trying to get.

+1


source







All Articles