Pandas Dataframe for nested JSON
I am trying to convert a Pandas Dataframe to a JSON object. My Dataframe contains data in the following format:
student date grade course
0 Student_1 2017-06-25 93 ENGLISH
1 Student_2 2017-06-25 83 ENGLISH
2 Student_1 2017-06-25 93 MATH
3 Student_2 2017-06-25 83 MATH
4 Student_1 2017-06-26 90 MATH
5 Student_2 2017-06-26 85 MATH
6 Student_1 2017-06-26 96 ENGLISH
7 Student_2 2017-06-26 99 ENGLISH
I want to convert it to JSON object in the following format:
[
{'ENGLISH': [
{
'date' : '2017-06-25',
'Student_1' : 93,
'Student_2' : 83
},
{
'date' : '2017-06-26',
'Student_1' : 96,
'Student_2' : 89
}]
},
{'MATH': [
{
'date' : '2017-06-25',
'Student_1' : 93,
'Student_2' : 83
},
{
'date' : '2017-06-26',
'Student_1' : 90,
'Student_2' : 85
}]
}
]
A simple call .to_json()
didn't work for me. Is there anyway I can create a JSON object in the required format in Pandas?
source to share
You can first define a function to convert subgroups to json, then apply that function to each group, and then combine the jsons subgroups into one json object.
def f(x):
return (dict({'date':x.date.iloc[0]},**{k:v for k,v in zip(x.student,x.grade)}))
(
df.groupby(['course','date'])
.apply(f)
.groupby(level=0)
.apply(lambda x: x.tolist())
.to_dict()
)
Out[1006]:
{'ENGLISH': [{'Student_1': 93, 'Student_2': 83, 'date': '2017-06-25'},
{'Student_1': 96, 'Student_2': 99, 'date': '2017-06-26'}],
'MATH': [{'Student_1': 93, 'Student_2': 83, 'date': '2017-06-25'},
{'Student_1': 90, 'Student_2': 85, 'date': '2017-06-26'}]}
source to share
Try the following:
file.csv
student,date,grade,course 0,Student_1,2017-06-25,93,ENGLISH 1,Student_2,2017-06-25,83,ENGLISH 2,Student_1,2017-06-25,93,MATH 3,Student_2,2017-06-25,83,MATH 4,Student_1,2017-06-26,90,MATH 5,Student_2,2017-06-26,85,MATH 6,Student_1,2017-06-26,96,ENGLISH 7,Student_2,2017-06-26,99,ENGLISH
Execute:
from collections import defaultdict
import json
import pandas as pd
df = pd.read_csv('file.csv')
json_doc = defaultdict(list)
for _id in df.T:
data = df.T[_id]
key = data.course
for elt in json_doc[key]:
if elt["date"] == data.date:
elt[data.student] = data.grade
break
else:
values = {'date': data.date, data.student: data.grade}
json_doc[key].append(values)
print(json.dumps(json_doc, indent=4))
Output:
{
"ENGLISH": [
{
"date": "2017-06-25",
"Student_1": 93,
"Student_2": 83
},
{
"date": "2017-06-26",
"Student_1": 96,
"Student_2": 99
}
],
"MATH": [
{
"date": "2017-06-25",
"Student_1": 93,
"Student_2": 83
},
{
"date": "2017-06-26",
"Student_1": 90,
"Student_2": 85
}
]
}
source to share
If you first have multiple indices in yours DataFrame
, and you do myDataframe.to_dict(orient='index')
, then it will create a dictionary where
key=tuple
and value="the remaining non-indexed columns"
.
You can simply create a recursive function that will nest the dict
number of elements in the key tuple
like this:
def recurse(test):
lentpl=len(list(test.keys())[0])
if lentpl==2:
return {k[0]:{k[1]:v} for k,v in test.items()}
else:
test2={k[0:-1]:{k[-1]:v} for k,v in test.items()}
return recurse(test2)
source to share