How to randomly select rows from a dataset using pandas?

I have a dataset with 36k rows. I want to randomly select 9k rows from it using pandas. How do you accomplish this task?

+3


source to share


2 answers


I think you can use sample

- 9k

or 25%

rows:

df.sample(n=9000)

      

Or:

df.sample(frac=0.25)

      



Another solution with making a random sample index

numpy.random.choice

and then choosing loc

- index

to be unique:

df = df.loc[np.random.choice(df.index, size=9000)]

      

Solution if not unique index:

df = df.iloc[np.random.choice(np.arange(len(df)), size=9000)]

      

+5


source


numpy



i = np.random.permutation(np.arange(len(df)))
idx = i[:9000]
pd.DataFrame(df.values[idx], df.index[idx])

      

+2


source







All Articles