Extract data from complex data structure in python

I have a data structure like

[ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
  {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]

      

this list contains many dictionaries, each with 3 pairs 'uid': 'test_subject145', 'class':'?', 'data':[]

. in the last pair, the value is a list, and it contains again a dictionary that has 2 pairs , in the pair ' entries ', its value is a list containing again many lists . What I want to extract, - the content of these proposals, such as and , and so on, and then placed them in a simple list. Its final form should be 'data'

'chunk':1, 'writing':[]

'this is exciting'

'you are good'

list_final = ['this is exciting', 'you are good', 'he died',... ]

+3


source to share


3 answers


Given that your original list has a name input

, just use a list comprehension:

[elem for dic in input
      for dat in dic.get('data',())
      for writing in dat.get('writing',())
      for elem in writing]
      

You can use .get(..,())

it so that if there is no such key, it still works: if there is no such key, we return an empty tuple ()

, so there are no iterations.



Based on your example input, we get:

>>> input = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]]}  ]  },
...       {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ] ]}  ] }]
>>> 
>>> [elem for dic in input
...       for dat in dic.get('data',())
...       for writing in dat.get('writing',())
...       for elem in writing]
['this is exciting', 'you are good', 'he died', 'go ahead']

      

+3


source


TL; dg

[str for dic in data
     for data_dict in dic['data']
     for writing_sub_list in data_dict['writing']
     for str in writing_sub_list]

      

Just move slowly and do one layer at a time. Then refactor your code to make it smaller.

data = [{'class': '?',
         'data': [{'chunk': 1,
                   'writing': [['this is exciting'], ['you are good']]}],
         'uid': 'test_subject145'},
        {'class': '?',
         'data': [{'chunk': 2,
         'writing': [['he died'], ['go ahead']]}],
         'uid': 'test_subject166'}]

for d in data:
    print(d)
# {'class': '?', 'uid': 'test_subject145', 'data': [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]}
# {'class': '?', 'uid': 'test_subject166', 'data': [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]}

for d in data:
     data_list = d['data']
     print(data_list)
# [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]
# [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]

for d in data:
     data_list = d['data']
     for d2 in data_list:
         print(d2)
# {'writing': [['this is exciting'], ['you are good']], 'chunk': 1}
# {'writing': [['he died'], ['go ahead']], 'chunk': 2}

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         print(writing_list)
# [['this is exciting'], ['you are good']]
# [['he died'], ['go ahead']]

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         for writing_sub_list in writing_list:
             print(writing_sub_list)
# ['this is exciting']
# ['you are good']
# ['he died']
# ['go ahead']

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         for writing_sub_list in writing_list:
             for str in writing_sub_list:
                  print(str)
# this is exciting
# you are good
# he died
# go ahead

      



Then, to convert to something smaller (but hard to read), rewrite the above code as follows. It should be easy to see how to go from one to the other:

strings = [str for d in data for d2 in d['data'] for wsl in d2['writing'] for str in wsl]
# ['this is exciting', 'you are good', 'he died', 'go ahead']

      

Then make it pretty with better names like Willem's answer:

[str for dic in data
     for data_dict in dic['data']
     for writing_sub_list in data_dict['writing']
     for str in writing_sub_list]

      

+2


source


So I believe below will work

lista = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
          {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]

list_of_final_products = []

for itema in lista:
  try:
    for data_item in itema['data']:
      for writa in data_item['writing']:
        for writa_itema in writa:
          list_of_final_products.append(writa)
  except:
    pass

      

This element, as stated above, I find useful in understanding - python gets a list of values ​​from a dict list (thanks McGrady)

+1


source







All Articles