How to randomly select rows from a dataset using pandas?
2 answers
I think you can use sample
- 9k
or 25%
rows:
df.sample(n=9000)
Or:
df.sample(frac=0.25)
Another solution with making a random sample index
numpy.random.choice
and then choosing loc
- index
to be unique:
df = df.loc[np.random.choice(df.index, size=9000)]
Solution if not unique index:
df = df.iloc[np.random.choice(np.arange(len(df)), size=9000)]
+5
source to share