Datetime index arithmetic in pandas
In pandas, you can access specific positions of a time series either by classic integer indexing based on position / row or datetime indexing. Integer index can be manipulated using basic arithmetic operations eg. if I have integer_index
for a time series with a frequency of 12 hours and I want to access the record exactly the day before, I just can do integer_index - 2
. However, real world data is not always perfect and sometimes rows are missing. In this case, this method fails and it would be useful to be able to use date indexing and subtract one day
from that index , for example . How can i do this?
Example script:
# generate a sample time series
import pandas as pd
s = pd.Series(["A", "B", "C", "D", "E"], index=pd.date_range("2000-01-01", periods=5, freq="12h"))
print s
2000-01-01 00:00:00 A
2000-01-01 12:00:00 B
2000-01-02 00:00:00 C
2000-01-02 12:00:00 D
2000-01-03 00:00:00 E
Freq: 12H, dtype: object
# these to indices should access the same value ("C")
integer_index = 2
date_index = "2000-01-02 00:00"
print s[integer_index] # prints "C"
print s[date_index] # prints "C"
# I can access the value one day earlier by subtracting 2 from the integer index
print s[integer_index - 2] # prints A
# how can I subtract one day from the date index?
print s[date_index - 1] # raises an error
The background to this question can be found in an earlier post of mine here:
Fill in data gaps with average data from adjacent days
where user JohnE found a workaround for my problem that uses integer position indexing. He is confident that I have an evenly distributed data by resampling the time series.
source to share
Your datetime index is not string based, it is DatetimeIndex
, which means you can use datetime
to index appropriately, not a string that looks like a date.
The code below converts date_index
to an object datetime
and then uses timedelta(days=1)
it to subtract "one day" from it.
# generate a sample time series
import pandas as pd
from datetime import datetime, timedelta
s = pd.Series(["A", "B", "C", "D", "E"], index=pd.date_range("2000-01-01", periods=5, freq="12h"))
print(s)
# these two indices should access the same value ("C")
integer_index = 2
# Converts the string into a datetime object
date_index = datetime.strptime("2000-01-02 00:00", "%Y-%m-%d %H:%M")
print(date_index) # 2000-01-02 00:00:00
print(s[integer_index]) # prints "C"
print(s[date_index]) # prints "C"
print(s[integer_index - 2]) # prints "A"
one_day = timedelta(days=1)
print(s[date_index - one_day]) # prints "A"
print(date_index - one_day) # 2000-01-01 00:00:00
source to share
The previous answer from Ffisegydd is excellent, except that pandas provides an equivalent Timedelta function that is compatible with np.timedelta64 and has a few additional features. Just replace timedelta(days=1)
with pd.Timedelta(days=1)
in his example to get more compatibility.
source to share