Python pandas gives wrong weekday index for DatetimeIndex
I want to get time series data and calculate the average number of rows on a weekday (Monday, Tuesday, ...). My details are as follows:
timestamp maxCapacity
Mon Aug 4 14:47:00 EDT 2014 6741
Mon Aug 4 14:48:01 EDT 2014 6741
To do this, I start by indexing the data frame by timestamp. Then I create a new column, getting the weekday from the timestamp index. However, the new column does not correctly assign weekday numbers.
Here is the code to create the problem.
import wget, pandas, csv
from dateutil import parser
url = 'https://www.dropbox.com/s/kbti3i8uzy82hw6/maxCapacity?dl=1'
dataFile = 'maxCapacitySample'
if not os.path.exists(dataFile):
wget.download(url, out=dataFile)
parse = lambda x: parser.parse(x)
tdata = pandas.read_csv(dataFile,
parse_dates={"Datetime":['timestamp',]},
index_col='Datetime',
keep_date_col=False,
date_parser=parse,
dialect=csv.excel_tab)
tdata['weekday'] = tdata.index.weekday
print tdata.head()
Output
maxCapacity weekday
Datetime
2014-08-04 14:40:00-04:00 6741 0
2014-08-04 14:47:00-04:00 6741 3
2014-08-04 14:48:01-04:00 6741 3
2014-08-04 14:49:00-04:00 6741 3
2014-08-04 14:50:00-04:00 6741 3
The problem is that the same day (4th) is displayed on weekdays 0 and 3. What am I doing wrong?
+3
source to share
1 answer
I managed to get a workaround via:
tdata['weekday'] = pandas.to_datetime(tdata.index.values).weekday
Resulting DataFrame:
maxCapacity weekday
Datetime
2014-08-04 14:40:00-04:00 6741 0
2014-08-04 14:47:00-04:00 6741 0
2014-08-04 14:48:01-04:00 6741 0
2014-08-04 14:49:00-04:00 6741 0
2014-08-04 14:50:00-04:00 6741 0
2014-08-04 14:51:00-04:00 6741 0
2014-08-04 14:52:00-04:00 6741 0
2014-08-04 14:53:00-04:00 6741 0
2014-08-04 14:54:00-04:00 6741 0
2014-08-04 14:55:00-04:00 6741 0
... ... ...
2014-08-20 09:37:00-04:00 6652 2
2014-08-20 09:38:00-04:00 6654 2
2014-08-20 09:39:00-04:00 6651 2
2014-08-20 09:40:00-04:00 6642 2
2014-08-20 09:41:00-04:00 6648 2
2014-08-20 09:42:00-04:00 6654 2
2014-08-20 09:43:00-04:00 6646 2
2014-08-20 09:44:00-04:00 6659 2
2014-08-20 09:45:00-04:00 6650 2
2014-08-20 09:46:00-04:00 6655 2
[6589 rows x 2 columns]
+1
source to share