How to use sequential random sampling in Python Pandas?
Below I have a code where I can read a csv file and take an arbitrary sample 700
from the file. I need to do this on multiple files, but if I iterate over the files, the sample (as random) will be different for each file, if I want to keep it in the same order once it is randomly generated.
df = pd.read_csv(file.csv, delim_whitespace=True)
df_s = df.sample(n=700)
My ideas are to take the line number and then pipe it to the next file, however that doesn't seem very elegant.
Do you know any good solutions to this problem?
THE CONFIRMATION
The file length is different, but the minimum file length is 750.
desired result EXAMPLE
df1 = pd.read_csv(file1.csv, delim_whitespace=True)
df_s1 = df1.sample(n=700) # choose random sample
df2 = pd.read_csv(file2.csv, delim_whitespace=True)
df_s2 = df2.sample(n=700) # use same random sample as above
+3
source to share