How can I split a dataset from a CSV file for training and testing?

Question

How can I split a dataset from a CSV file for training and testing?

I am using Python and I need to split my imported .csv data into two parts: training and test set, 70% EG training and 30% test.

I keep getting various errors like 'list' object is not callable

and so on.

Is there an easy way to do this?

thank

EDIT:

The code is basic, I just want to split the dataset.

from csv import reader
with open('C:/Dataset.csv', 'r') as f:
    data = list(reader(f)) #Imports the CSV
    data[0:1] ( data )

TypeError: 'list' object is not callable

+4

python split csv data-science

Midi Apr 29. 17 at 15:13

source to share

3 answers

You have to use a function read_csv ()

from the pandas module. It reads all of your data straight into a dataframe, which you can use further to split your data into train and test. Likewise, you can use a function train_test_split()

from the scikit-learn module.

+2

dr_dronych Apr 29. 17 at 15:43

source to share

Best practice, and possibly more casual, is to use df.sample

:

from numpy.random import RandomState
import pandas as pd

df = pd.read_csv('C:/Dataset.csv')
rng = RandomState()

train = df.sample(frac=0.7, random_state=rng)
test = df.loc[~df.index.isin(train.index)]

0

Flair 10 Aug 17 at 19:15

source to share

zipa · Accepted Answer · 2017-04-29T15:48:35+0000

You can use pandas

:

import pandas as pd
import numpy as np

df = pd.read_csv('C:/Dataset.csv')
df['split'] = np.random.randn(df.shape[0], 1)

msk = np.random.rand(len(df)) <= 0.7

train = df[msk]
test = df[~msk]

How can I split a dataset from a CSV file for training and testing?

More articles: