Pandas time comparison
I want to get the average difference between the overlapping parts of two time series. However, both their ranges and their intervals are different. What's the best way to solve these two problems?
Sample data:
Series 1:
2014-08-05 05:03:00 25.194
2014-08-05 05:08:00 25.196
2014-08-05 05:13:00 25.197
2014-08-05 05:18:00 25.199
2014-08-05 05:23:00 25.192
Series 2:
2014-08-05 05:12:00 25.000000
2014-08-05 05:13:00 25.000000
2014-08-05 05:14:00 25.000000
+3
source to share
1 answer
Is this what you are looking for?
First you can align both series (so they both have the same index. It is also possible to only reindex one of them into the index of the other with reindex
):
In [85]: s1, s2 = s1.align(s2)
In [86]: s1
Out[86]:
2014-08-05 05:03:00 25.194
2014-08-05 05:08:00 25.196
2014-08-05 05:12:00 NaN
2014-08-05 05:13:00 25.197
2014-08-05 05:14:00 NaN
2014-08-05 05:18:00 25.199
2014-08-05 05:23:00 25.192
dtype: float64
In [87]: s2
Out[87]:
2014-08-05 05:03:00 NaN
2014-08-05 05:08:00 NaN
2014-08-05 05:12:00 25
2014-08-05 05:13:00 25
2014-08-05 05:14:00 25
2014-08-05 05:18:00 NaN
2014-08-05 05:23:00 NaN
dtype: float64
Then you can interpolate the missing values ββ(e.g. linear interpolation based on the time index):
In [88]: s1.interpolate(method='time')
Out[88]:
2014-08-05 05:03:00 25.1940
2014-08-05 05:08:00 25.1960
2014-08-05 05:12:00 25.1968
2014-08-05 05:13:00 25.1970
2014-08-05 05:14:00 25.1974
2014-08-05 05:18:00 25.1990
2014-08-05 05:23:00 25.1920
dtype: float64
And then just align both rows to get the difference:
In [91]: s = s1.interpolate(method='time') - s2.interpolate(method='time')
In [92]: s
Out[92]:
2014-08-05 05:03:00 NaN
2014-08-05 05:08:00 NaN
2014-08-05 05:12:00 0.1968
2014-08-05 05:13:00 0.1970
2014-08-05 05:14:00 0.1974
2014-08-05 05:18:00 0.1990
2014-08-05 05:23:00 0.1920
dtype: float64
In [93]: s.mean()
Out[93]: 0.19643999999999906
+5
source to share