Continuous analysis of CSV files that are updated by another process

If I have a bunch of files csv

and they update periodically. Let's say csv files:

file1.csv, file2.csv file3.csv

      

The update process adds data to the last line of the file csv

.

it is possible to read data from a file csv

and update it and store it in array

or collection(deque)

.

Is there a way to collect data from the csv file as it is updated?

+3


source to share


2 answers


You can use a python package called Watchdog .

This example shows recursively tracking the current directory for file system changes and logging to the console:

import time
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler

if __name__ == "__main__":
    event_handler = LoggingEventHandler()
    observer = Observer()
    observer.schedule(event_handler, path='.', recursive=True)
    observer.start()
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

      



You can use this in conjunction with Ignacio's answer - use file_pointer.tell()

to get the current position in the file and then seek()

next time and read the rest of the file. For example:

# First time
with open('current.csv', 'r') as f:
    data = f.readlines()
    last_pos = f.tell() 

# Second time
with open('current.csv', 'r') as f:
    f.seek(last_pos)
    new_data = f.readlines()
    last_pos = f.tell()

      

+1


source


Compare the current file size with the current offset in the file. If the size is larger, read the new data.



0


source







All Articles