Python: assigning grouped values ​​to means of a one-dimensional array

Suppose I have 2 arrays:

x    = [2, 4, 1, 7, 3, 9, 2, 5, 5, 1]
flag = [0, 1, 0, 2, 1, 1, 2, 0, 0, 2]

      

The array flag

indicates which group each element belongs to x

. How can I replace each element x

(with, say, a flag value k

) with the mean of all elements x

whose corresponding value is flag

also equal k

?

After such a transformation, it x

will look like this:

x    = [3.25, 5.33, 3.25, 3.33, 5.33, 5.33, 3.33, 3.25, 3.25, 3.33]

      

(I could use loops to achieve this, but that would be pretty inefficient.)

+3


source to share


3 answers


One option is to use Pandas:

import pandas as pd
x    = [2, 4, 1, 7, 3, 9, 2, 5, 5, 1]
flag = [0, 1, 0, 2, 1, 1, 2, 0, 0, 2]
s = pd.Series(x,index=flag)
s.groupby(level=0).transform('mean').tolist()

      



Output:

[3.25,
 5.333333333333333,
 3.25,
 3.3333333333333335,
 5.333333333333333,
 5.333333333333333,
 3.3333333333333335,
 3.25,
 3.25,
 3.3333333333333335]

      

+3


source


You can use np.bincount

to calculate grouped funds:

import numpy as np
x    = np.array([2, 4, 1, 7, 3, 9, 2, 5, 5, 1])
flag = np.array([0, 1, 0, 2, 1, 1, 2, 0, 0, 2])
total = np.bincount(flag, weights=x)
count = np.bincount(flag)
means = (total/count)[flag]

      

gives



array([ 3.25      ,  5.33333333,  3.25      ,  3.33333333,  5.33333333,
        5.33333333,  3.33333333,  3.25      ,  3.25      ,  3.33333333])

      


For more generalized grouped statistics, there is also a function scipy.stats.binned_statistic

. It can calculate grouped averages, median, count, total, minimum, maximum. It can also accept custom functions for statistics, but the performance will (of course) be slower than built-in statistics.

+5


source


>>> def grouped_mean(data, flags):
...     flag_set = set(flags)
...     flags = np.asarray(flags)
...     data = np.array(data)
...     for s in flag_set:
...         m = (flags == s)
...         data[m] = np.mean(data[m])
...     return data
... 

>>> grouped_mean(x, flag)
array([ 3.25      ,  5.33333333,  3.25      ,  3.33333333,  5.33333333,
        5.33333333,  3.33333333,  3.25      ,  3.25      ,  3.33333333])

      

+2


source







All Articles