How to randomly select rows from a dataset using pandas?
I have a dataset with 36k rows. I want to randomly select 9k rows from it using pandas. How do you accomplish this task?
+3
Niranjan Agnihotri
source
to share
2 answers
I think you can use sample
- 9k
or 25%
rows:
df.sample(n=9000)
Or:
df.sample(frac=0.25)
Another solution with making a random sample index
numpy.random.choice
and then choosing loc
- index
to be unique:
df = df.loc[np.random.choice(df.index, size=9000)]
Solution if not unique index:
df = df.iloc[np.random.choice(np.arange(len(df)), size=9000)]
+5
jezrael
source
to share
numpy
i = np.random.permutation(np.arange(len(df))) idx = i[:9000] pd.DataFrame(df.values[idx], df.index[idx])
+2
piRSquared
source
to share