Splitting pandas data into training and test suites when indexed over time
If I have a time-indexed dataframe, how can I split it into 2 / 3rds training and 1 / 3rd test training and test sets?
Should I create a new integer incrementing column and then use set_index () on the new integer column?
Or can I do this while keeping the time index? if so, I have no idea how to do it.
Do I need to manually select the date to act as a split point, or is there some other way?
source to share
Just use iloc
, which is an integer based indexing method, the fact that the index is a temporary type doesn't matter when usingiloc
In [6]:
df = pd.DataFrame({'a':['1','2','3','4','5']})
df.iloc[0: floor(2 * len(df)/3)]
C:\WinPython-64bit-3.3.5.0\python-3.3.5.amd64\lib\site-packages\pandas\core\index.py:687: FutureWarning: slice indexers when using iloc should be integers and not floating point
"and not floating point",FutureWarning)
Out[6]:
a
0 1
1 2
2 3
In [7]:
df.iloc[floor(2 * len(df) /3):]
Out[7]:
a
3 4
4 5
You can ignore the warning here, using gender is because 3.3333 is not a valid index value
You can also use the scait-learns cross-validation method which will return the train split indices to you.
source to share