Python: assigning grouped values ββto means of a one-dimensional array
Suppose I have 2 arrays:
x = [2, 4, 1, 7, 3, 9, 2, 5, 5, 1]
flag = [0, 1, 0, 2, 1, 1, 2, 0, 0, 2]
The array flag
indicates which group each element belongs to x
. How can I replace each element x
(with, say, a flag value k
) with the mean of all elements x
whose corresponding value is flag
also equal k
?
After such a transformation, it x
will look like this:
x = [3.25, 5.33, 3.25, 3.33, 5.33, 5.33, 3.33, 3.25, 3.25, 3.33]
(I could use loops to achieve this, but that would be pretty inefficient.)
source to share
One option is to use Pandas:
import pandas as pd x = [2, 4, 1, 7, 3, 9, 2, 5, 5, 1] flag = [0, 1, 0, 2, 1, 1, 2, 0, 0, 2] s = pd.Series(x,index=flag) s.groupby(level=0).transform('mean').tolist()
Output:
[3.25,
5.333333333333333,
3.25,
3.3333333333333335,
5.333333333333333,
5.333333333333333,
3.3333333333333335,
3.25,
3.25,
3.3333333333333335]
source to share
You can use np.bincount
to calculate grouped funds:
import numpy as np x = np.array([2, 4, 1, 7, 3, 9, 2, 5, 5, 1]) flag = np.array([0, 1, 0, 2, 1, 1, 2, 0, 0, 2]) total = np.bincount(flag, weights=x) count = np.bincount(flag) means = (total/count)[flag]
gives
array([ 3.25 , 5.33333333, 3.25 , 3.33333333, 5.33333333,
5.33333333, 3.33333333, 3.25 , 3.25 , 3.33333333])
For more generalized grouped statistics, there is also a function scipy.stats.binned_statistic
. It can calculate grouped averages, median, count, total, minimum, maximum. It can also accept custom functions for statistics, but the performance will (of course) be slower than built-in statistics.
source to share
>>> def grouped_mean(data, flags):
... flag_set = set(flags)
... flags = np.asarray(flags)
... data = np.array(data)
... for s in flag_set:
... m = (flags == s)
... data[m] = np.mean(data[m])
... return data
...
>>> grouped_mean(x, flag)
array([ 3.25 , 5.33333333, 3.25 , 3.33333333, 5.33333333,
5.33333333, 3.33333333, 3.25 , 3.25 , 3.33333333])
source to share