Continuous analysis of CSV files that are updated by another process

Question

Continuous analysis of CSV files that are updated by another process

If I have a bunch of files csv

and they update periodically. Let's say csv files:

file1.csv, file2.csv file3.csv

The update process adds data to the last line of the file csv

.

it is possible to read data from a file csv

and update it and store it in array

or collection(deque)

.

Is there a way to collect data from the csv file as it is updated?

+3

python csv

rnish 09 Feb 13 at 1:59 am

source to share

2 answers

Compare the current file size with the current offset in the file. If the size is larger, read the new data.

0

Ignacio Vazquez-Abrams 09 Feb 13 at 2:03

source to share

Alex L · Accepted Answer · 2013-02-11T02:58:28+0000

You can use a python package called Watchdog .

This example shows recursively tracking the current directory for file system changes and logging to the console:

import time
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler

if __name__ == "__main__":
    event_handler = LoggingEventHandler()
    observer = Observer()
    observer.schedule(event_handler, path='.', recursive=True)
    observer.start()
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

You can use this in conjunction with Ignacio's answer - use file_pointer.tell()

to get the current position in the file and then seek()

next time and read the rest of the file. For example:

# First time
with open('current.csv', 'r') as f:
    data = f.readlines()
    last_pos = f.tell() 

# Second time
with open('current.csv', 'r') as f:
    f.seek(last_pos)
    new_data = f.readlines()
    last_pos = f.tell()

Continuous analysis of CSV files that are updated by another process

More articles: