Don't understand Python csv.reader object

I ran into behavior in python's built-in csv module that I never noticed about. Typically when I read into csv it follows the doc quite a bit verbatim, using 'with' to open the file and then looping the reader to the object with a 'for' loop. However, I recently tried iterating over the csv.reader object twice in a row, only to find that the second "for" loop did nothing.

import csv

with open('smallfriends.csv','rU') as csvfile:
readit = csv.reader(csvfile,delimiter=',')

for line in readit:
    print line

for line in readit:
    print 'foo'

      

Console output:

Austins-iMac:Desktop austin$ python -i amy.py 
['Amy', 'James', 'Nathan', 'Sara', 'Kayley', 'Alexis']
['James', 'Nathan', 'Tristan', 'Miles', 'Amy', 'Dave']
['Nathan', 'Amy', 'James', 'Tristan', 'Will', 'Zoey']
['Kayley', 'Amy', 'Alexis', 'Mikey', 'Sara', 'Baxter']
>>>
>>> readit
<_csv.reader object at 0x1023fa3d0>
>>> 

      

So the second 'for' loop basically does nothing. I thought the csv.reader object is freed from memory after reading it once. This is not the case as it still stores its memory address. I found a post that mentions a similar issue. The reason they were given is because after the object is read, the pointer remains at the end of the memory address ready to write data to the object. It's right? Can anyone elaborate on what's going on here in more detail? Is there a way to move the pointer back to the beginning of the memory address to re-read it? I know this is a bad way of coding, but I'm mostly just curious and want to know more about what's going on under the hood of Python.

Thank!

+3


source to share


3 answers


I'll try to answer your other questions about what the reader is doing and why reset()

or seek(0)

might help. In its most basic form, a csv reader might look something like this:

def csv_reader(it):
    for line in it:
        yield line.strip().split(',')

      

That is, it takes any iterator that produces strings and gives you a generator. All it does is take an element from your iterator, process it, and return the element. When it

consumed, the csv_reader will close. The reader doesn't know where the iterator is from or how to properly make a new one, so he doesn't even try to reset it himself. This is left for the programmer.



We can either change the iterator into place without knowing the reader, or just make a new reader. Here are some examples to demonstrate your point.

data = open('data.csv', 'r')
reader = csv.reader(data)

print(next(reader))               # Parse the first line
[next(data) for _ in range(5)]    # Skip the next 5 lines on the underlying iterator
print(next(reader))               # This will be the 7'th line in data
print(reader.line_num)            # reader thinks this is the 2nd line
data.seek(0)                      # Go back to the beginning of the file
print(next(reader))               # gives first line again

data = ['1,2,3', '4,5,6', '7,8,9']
reader = csv.reader(data)         # works fine on lists of strings too
print(next(reader))               # ['1', '2', '3']

      

In general, if you need a second pass, your best bet is to close / reopen the files and use the new csv reader. Its clean and provides good bookkeeping.

+3


source


Iterating through the csvreader just ends up iterating over the lines in the underlying file object. At each iteration, the reader gets the next line from the file, converts it, and returns it.

Thus, iterating over with csvreader follows the same conventions as iterating over files . That is, once the file reaches its end, you have to search for it before repeating it a second time.



The following should be done, although I haven't tested it:

import csv

with open('smallfriends.csv','rU') as csvfile:
    readit = csv.reader(csvfile,delimiter=',')

    for line in readit:
        print line

    # go back to the start of the file
    csvfile.seek(0)

    for line in readit:
        print 'foo

      

+1


source


If it's not too much data, you can always read it from the list:

import csv

with open('smallfriends.csv','rU') as csvfile:
    readit = csv.reader(csvfile,delimiter=',')
    csvdata = list(readit)

    for line in csvdata :
        print line

    for line in csvdata :
        print 'foo'

      

0


source







All Articles