How do I find the priority value for a unique category in python?
I have a lot of repeating categories where each category has different weights, and I want to assign 1 weight to each unique category based on priority.
mydata
category original_wt predicted_wt categorized categorized_value
1 xxxxx 2.5 3.0 original 2.5
2 yyyyy 3.5 4.0 predicted 4.0
3 zzzzz 3.0 5.0 predicted 5.0
4 aaaaa 4.0 2.5 original 4.0
5 bbbbb 3.2 5.5 original 3.2
6 ccccc 4.6 3.5 predicted 3.5
7 xxxxx 2.5 4.0 original 2.5
8 xxxxx 4.0 5.5 predicted 5.5
9 yyyyy 2.5 4.0 predicted 4.0
10yyyyy 3.0 2.0 predicted 2.0
11aaaaa 5.0 4.5 original 5.0
For eg1: for category "xxxxx" we have three categorized values โโ(2.5.2.5.5.5).
so out of this we have to give priority to 2.5 because it is repeated
eg2: for the category "yyyyy" we have three categorized values โโ(4.0, 4, 2, 2).
so out of this we should prioritize 2.0 because it is the most repetitive
But if we only have one item in the category, it should keep the same. And if we have two items with two different weights, we must keep the categorized value high
Expected output:
mydata
category original_wt predicted_wt categorized categorized_value
1 xxxxx 2.5 3.0 original 2.5
2 yyyyy 3.5 4.0 predicted 4.0
3 zzzzz 3.0 5.0 predicted 5.0
4 aaaaa 4.0 2.5 original 4.0
5 bbbbb 3.2 5.5 original 3.2
6 ccccc 4.6 3.5 predicted 3.5
7 aaaaa 5.0 4.5 original 5.0
Tried:
category_grouping_by_catg_value = mydata.groupby(['category','categorized_value']).apply(pd.DataFrame.mode).reset_index(drop=True).
By doing the above, I am getting some random values.
How can I do this in python.
source to share
You might want to do something like this:
df['mode'] = df.groupby('category')['categorized_value'].transform(pd.Series.mode)
df['mode'] = df.groupby('category')['mode'].transform(max)
print df.drop_duplicates(['category', 'mode']).set_index('category').sort_index()[['categorized_value', 'mode']]
Updated code to select max categorized_value
if .mode
there are more left ones left after .
categorized_value mode
category
aaaaa 4.0 5.0
bbbbb 3.2 3.2
ccccc 3.5 3.5
xxxxx 2.5 2.5
yyyyy 4.0 4.0
zzzzz 5.0 5.0
source to share