# Create weighted average for irregular time series in pandas

from simulation data with timestep variable I have an irregular time vector as an index for my values, they are stored in pandas.DataFrame.

Let's consider a simplified test case:

``````import pandas as pd
import datetime
time_vec = [datetime.time(0,0),datetime.time(0,0),datetime.time(0,5),datetime.time(0,7),datetime.time(0,10)]
df = pd.DataFrame([1,2,4,3,6],index = time_vec)
```

```

Using the normal `df.mean()`

-function would lead to answer 3.2, which would only be true if the temporal vector were equidistant.

I think the correct result would be 3.55, as for the first time interval (zero seconds), the average is 1.5, for the second time value the average is 3 (five minutes), etc., this results in :

``````1.5 * 0 + 3*5 + 3.5 * 2 + 4.5 * 3 = 35.5
```

```

which results in an average of 3.55 (35.5 / (0 + 5 + 2 + 3)).

Is there an efficient way to do this using pandas?

This should result in something like

``````df.resample('15M',how = 'This very Method I am looking for')
```

```

to generate averages with an equidistant time vector.

+3

source to share

Well I figured out how to solve my problem. I don't know if this is a good solution, but it works.

I changed the original code in the question, exchanging `datetime.time`

for `datetime.datetime`

, otherwise it won't work (for `datetime.time-Objects`

no method `total_seconds()`

). I also had to import numpy to be able to use numpy.average.

So now the code will look like this:

``````import datetime
import numpy as np
import pandas as pd
time_vec =     [datetime.datetime(2007,1,1,0,0)
,datetime.datetime(2007,1,1,0,0)
,datetime.datetime(2007,1,1,0,5)
,datetime.datetime(2007,1,1,0,7)
,datetime.datetime(2007,1,1,0,10)]
df = pd.DataFrame([1,2,4,3,6],index = time_vec)
```

```

This little function solved my problem:

``````def time_based_weighted_mean(tv_df):
time_delta = [(x-y).total_seconds() for x,y in zip(df.index[1:],df.index[:-1])]
weights = [x+y for x,y in zip(+ time_delta,time_delta+)]
res = np.average(df,weights = weights)
return res
print time_based_weighted_mean(df)
```

```

At first I tried to use `pd.index.diff()`

time_delta-Array to calculate the array, but this led to a series `numpy.datetime64`

where I didn't know how to convert them to floats as it `np.average`

requires float as the input type for the weights.

I am grateful for any suggestions for improving the code.

+2

source

All Articles