Randomly select rows from file based on time in columns

It's a little tricky and I really appreciate any help! I am trying to randomly sample lines from a CSV file. Basically, I want to get the resulting unique locations file (locations are specified by columns Easting

and Northing

data file, below). I want to randomly get 1st place in the 12-hour period for SessionDate

the file (12-hour period, divided into: between 0631

and 1829

hours and between 1830

and 0630

hours, taking into account both Start:

, and End:

in the data file below); BUT if any 2 locations are within 6 hours of each other (depending on their timeStart:

), in order for this location to be selected, and a new location to be randomly drawn, and for this sampling to continue until new locations are drawn (i.e., sampling without replacement). I am trying to do this with python, but my experience is very limited. I've tried putting each line into a dictionary first, and recently each line into a list:

import random
import csv

f = open('file.csv', "U")
list = []

for line in f:
    list.append(line.split(','))

      

I'm not sure where to go from here - how to sample from these lists the way I need to and then write them to the output file with my "unique" locations.

Here are the first few lines of my data file:

SessionDate Start:  End:    Easting Northing
27-Apr-07   18:00   21:45   174739  9785206
28-Apr-07   18:00   21:30   171984  9784738
28-Apr-07   18:00   21:30   171984  9784738
28-Apr-07   18:00   21:30   171984  9784738
28-Apr-07   18:00   21:30   171984  9784738

      

It gets a little more complicated as some of the observations span midnight, so they might be on different dates, but might be within 6 hours of each other (which is why I have this criterion), for example:

SessionDate Start:  End:    Easting Northing
27-Apr-07   22:30   23:25   171984  9784738
28-Apr-07   0:25    1:30    174739  9785206

      

+3


source to share


2 answers


Here's my solution - I made a few changes to your data (location to make it easier to see the results). I am basically creating dict

dates pointing to other dict

locations that point to a list of selected rows.



data  = """SessionDate Start:  End:    Easting Northing
27-Apr-07   18:00   21:45   A  1
27-Apr-07   18:00   21:30   G  2
28-Apr-07   18:00   21:30   B  2
28-Apr-07   18:00   21:30   B  2
28-Apr-07   18:00   21:30   B  2
29-Apr-07   8:00    11:30   C 3
29-Apr-07   20:00   21:30   C  3
29-Apr-07   20:00   21:30   C  3
30-Apr-07   8:00   10:30   D  4
30-Apr-07   16:00   17:30   E  5
30-Apr-07   14:00   21:30   F  6
30-Apr-07   18:00   21:30   F  6
"""

selected = {}
for line in data.split("\n"):
    if "Session" in line:
        continue
    if not line:
        continue

    tmp = [x for x in line.split() if x]
    raw_dt = " ".join([tmp[0], tmp[1]]).strip()
    curr_dt = datetime.strptime(raw_dt, "%d-%b-%y %H:%M")
    loc = (tmp[-2], tmp[-1])

    found = False
    for dt in selected:
        diff = dt - curr_dt
        if dt < curr_dt:
            diff = curr_dt - dt
        # print dt, curr_dt, diff, diff <= timedelta(hours=12), loc, loc in selected[dt]            
        if diff <= timedelta(hours=12):
            if loc not in selected[dt]:
                selected[dt].setdefault(loc, []).append(tmp)
                found = True
            else:
                found = True
    if not found:
        if curr_dt not in selected:
            selected[curr_dt] = {}
        if loc not in selected[curr_dt]:
            selected[curr_dt][loc] = [tmp,]

# if output needs to be sorted
rows = sorted(x for k in selected for l in selected[k] for x in selected[k][l])
for row in rows:
    print " ".join(row)

      

0


source


This is not a complete answer, but something that points you in the right direction.

As I said in a comment, the handling of datetime objects in python is done using the datetime module . Here's a small example related to your problem:



from datetime import datetime

d1 = datetime.strptime("27-Apr-07 18:00", "%d-%b-%y %H:%M")
d2 = datetime.strptime("28-Apr-07 01:00", "%d-%b-%y %H:%M")

difference = d2 - d1

#Difference in hours
dH = difference.days*24 + difference.seconds/3600

      

Other than that, just scroll through the sorted file, after reading the entire 12H block, ramdomly sample, make sure your unique condition is met (if not repeated) and go.

-1


source







All Articles