How do I find the priority value for a unique category in python?

I have a lot of repeating categories where each category has different weights, and I want to assign 1 weight to each unique category based on priority.

mydata

  category  original_wt  predicted_wt   categorized   categorized_value
1 xxxxx      2.5          3.0            original      2.5
2 yyyyy      3.5          4.0            predicted     4.0
3 zzzzz      3.0          5.0            predicted     5.0
4 aaaaa      4.0          2.5            original      4.0
5 bbbbb      3.2          5.5            original      3.2
6 ccccc      4.6          3.5            predicted     3.5
7 xxxxx      2.5          4.0            original      2.5
8 xxxxx      4.0          5.5            predicted     5.5
9 yyyyy      2.5          4.0            predicted     4.0
10yyyyy      3.0          2.0            predicted     2.0
11aaaaa      5.0          4.5            original      5.0

      

For eg1: for category "xxxxx" we have three categorized values โ€‹โ€‹(2.5.2.5.5.5).
so out of this we have to give priority to 2.5 because it is repeated
eg2: for the category "yyyyy" we have three categorized values โ€‹โ€‹(4.0, 4, 2, 2).
so out of this we should prioritize 2.0 because it is the most repetitive

But if we only have one item in the category, it should keep the same. And if we have two items with two different weights, we must keep the categorized value high

Expected output:
mydata

 category  original_wt  predicted_wt   categorized   categorized_value
1 xxxxx      2.5          3.0            original      2.5
2 yyyyy      3.5          4.0            predicted     4.0
3 zzzzz      3.0          5.0            predicted     5.0
4 aaaaa      4.0          2.5            original      4.0
5 bbbbb      3.2          5.5            original      3.2
6 ccccc      4.6          3.5            predicted     3.5
7 aaaaa      5.0          4.5            original      5.0


Tried:
category_grouping_by_catg_value = mydata.groupby(['category','categorized_value']).apply(pd.DataFrame.mode).reset_index(drop=True).

      

By doing the above, I am getting some random values.
How can I do this in python.

+3


source to share


1 answer


You might want to do something like this:

df['mode'] = df.groupby('category')['categorized_value'].transform(pd.Series.mode)
df['mode'] = df.groupby('category')['mode'].transform(max)
print df.drop_duplicates(['category', 'mode']).set_index('category').sort_index()[['categorized_value', 'mode']]

      



Updated code to select max categorized_value

if .mode

there are more left ones left after .

          categorized_value  mode
category                         
aaaaa                   4.0   5.0
bbbbb                   3.2   3.2
ccccc                   3.5   3.5
xxxxx                   2.5   2.5
yyyyy                   4.0   4.0
zzzzz                   5.0   5.0

      

0


source







All Articles