How to execute functions on group results in pandas in python?
I used this code to calculate values ββfor different quality scores for each user in each cluster
>>> for name, group in df.groupby(["Cluster_id", "User"]):
... print 'group name:', name
... print 'group rows:'
... print group
... print 'counts of Quality values:'
... print group["Quality"].value_counts()
... raw_input()
...
But now I am getting output as
group rows:
tag user quality cluster
676 black fabric http://steve.nl/user_1002 usefulness-useful 1
708 blond wood http://steve.nl/user_1002 usefulness-useful 1
709 blond wood http://steve.nl/user_1002 problematic-misspelling 1
1410 eames? http://steve.nl/user_1002 usefulness-not_useful 1
1411 eames? http://steve.nl/user_1002 problematic-misperception 1
3649 rocking chair http://steve.nl/user_1002 usefulness-useful 1
3650 rocking chair http://steve.nl/user_1002 problematic-misperception 1
counts of Quality Values:
usefulness-useful 3
problematic-misperception 2
usefulness-not_useful 1
problematic-misspelling 1
Now I would like to have a ie check condition:
if quality==usefulness-useful:
good = good + 1
else:
bad = bad + 1
I tried to write the output:
counts of Quality Values:
usefulness-useful 3
problematic-misperception 2
usefulness-not_useful 1
problematic-misspelling 1
into a variable and tried to traverse the row by row variable but it doesn't work. Can anyone give me suggestions on how to perform calculations on specific lines.
source to share
Once you have the group, you can repeat line by line using the method .iterrows()
. It gives you the line index and the line:
In [33]: for row_number, row in group.iterrows():
....: print row_number
....: print row
....:
676
Tag black fabric
User http://steve.nl/user_1002
Quality usefulness-useful
Cluster_id 1
Name: 676
708
Tag blond wood
User http://steve.nl/user_1002
Quality usefulness-useful
Cluster_id 1
Name: 708
[etc]
and each of these lines can be indexed like a dictionary like:
In [48]: row
Out[48]:
Tag rocking chair
User http://steve.nl/user_1002
Quality problematic-misperception
Cluster_id 1
Name: 3650
In [49]: row["User"]
Out[49]: 'http://steve.nl/user_1002'
In [50]: row["Tag"]
Out[50]: 'rocking chair'
And you can write your loop like
good = 0
bad = 0
for row_number, row in group.iterrows():
if row['Quality'] == 'usefulness-useful':
good += 1
else:
bad += 1
print 'good', good, 'bad', bad
which gives
good 3 bad 4
This is a great way to do it if it makes sense to you. Another way is to work directly from the counts on the column Quality
:
In [54]: counts = group["Quality"].value_counts()
In [55]: counts
Out[55]:
usefulness-useful 3
problematic-misperception 2
usefulness-not_useful 1
problematic-misspelling 1
In [56]: counts['usefulness-useful']
Out[56]: 3
and since bad = total - good, we have
In [57]: counts.sum() - counts['usefulness-useful']
Out[57]: 4
source to share