Splitting pandas data into training and test suites when indexed over time

If I have a time-indexed dataframe, how can I split it into 2 / 3rds training and 1 / 3rd test training and test sets?

Should I create a new integer incrementing column and then use set_index () on the new integer column?

Or can I do this while keeping the time index? if so, I have no idea how to do it.

Do I need to manually select the date to act as a split point, or is there some other way?

+3


source to share


1 answer


Just use iloc

, which is an integer based indexing method, the fact that the index is a temporary type doesn't matter when usingiloc

In [6]:

df = pd.DataFrame({'a':['1','2','3','4','5']})
df.iloc[0: floor(2 * len(df)/3)]

C:\WinPython-64bit-3.3.5.0\python-3.3.5.amd64\lib\site-packages\pandas\core\index.py:687: FutureWarning: slice indexers when using iloc should be integers and not floating point
  "and not floating point",FutureWarning)
Out[6]:
   a
0  1
1  2
2  3
In [7]:

df.iloc[floor(2 * len(df) /3):]
Out[7]:
   a
3  4
4  5

      



You can ignore the warning here, using gender is because 3.3333 is not a valid index value

You can also use the scait-learns cross-validation method which will return the train split indices to you.

+5


source







All Articles