Python: convert subset of list to dictionary by matching specific elements
I have a list in Python, each element of which is one German word, for example:
my_list = [..., 'Stahl ', 'Stahl ', 'Die ', '*die ', 'Rheinhausener ', 'Rhein=Hausener ', 'Mittelstreckenraketen', 'Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete',...]
In this list, compound nouns immediately follow their possible decompositions / partitions (there can be an arbitrary number of partitions / partitions)
eg. "Mittelstreckenraketen" has 3 expansions / partitions:
'Mittel = Strecken = Rakete', 'Mittel = strecken = Rakete', 'Mittels = trecken = Rakete'
whereas "Rheinhausener" only has one:
'Rhein = Hausener'
The list contains approximately 50,000 items.
What I would like to do is extract only compound nouns and their decompositions / splits (discarding all other elements in the list) and read them into a dictionary with the compound noun as the key and the decomposition / splitting as values, e.g .:
my_dict = {...,'Rheinhausener ': ['Rhein=Hausener '], 'Mittelstreckenraketen': ['Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete'],...}
Thus, discarding items such as:
'Stahl', 'Stahl', 'Die', '* die'
I was thinking about going through the list and every time an element appears with one or more equals = =, taking the previous element and storing it as a key. But I'm too much a Python newbie to figure out how to account for an arbitrary number of values ββfor each dictionary entry; so i appreciate any help.
source to share
Here's one way to do it using defaultdict. Defaultdict automatically creates an empty list if we try to access a key that doesn't exist.
#!/usr/bin/env python
from collections import defaultdict
my_list = [
'Stahl ',
'Stahl ',
'Die ',
'*die ',
'Rheinhausener ',
'Rhein=Hausener ',
'Mittelstreckenraketen',
'Mittel=Strecken=Rakete',
'Mittel=strecken=Rakete',
'Mittels=trecken=Rakete'
]
my_dict = defaultdict(list)
key = None
for word in my_list:
if '=' in word:
if key is None:
print 'Error: No key found for', word
my_dict[key].append(word)
else:
key = word
for key in my_dict:
print '%r: %r' % (key, my_dict[key])
Output
'Rheinhausener ': ['Rhein=Hausener ']
'Mittelstreckenraketen': ['Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete']
Note that this code will not work correctly unless the key element is immediately preceded by a series of expansions.
source to share
You can use defaultdict:
from collections import defaultdict
my_list = ['Stahl ', 'Stahl ', 'Die ', '*die ', 'Rheinhausener ', 'Rhein=Hausener ', 'Mittelstreckenraketen', 'Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete']
my_dict = defaultdict(list)
value = ''
for item in my_list:
if '=' not in item:
value = item
else:
my_dict[value].append(item)
print my_dict
which prints
defaultdict(<type 'list'>, {'Rheinhausener ': ['Rhein=Hausener '], 'Mittelstreckenraketen': ['Mittel=Strecken=Rakete', 'Mittel=strecken=Rakete', 'Mittels=trecken=Rakete']})
It takes the last element it saw without the '=' character, which is the word we are trying to get.
source to share