Don't understand Python csv.reader object
I ran into behavior in python's built-in csv module that I never noticed about. Typically when I read into csv it follows the doc quite a bit verbatim, using 'with' to open the file and then looping the reader to the object with a 'for' loop. However, I recently tried iterating over the csv.reader object twice in a row, only to find that the second "for" loop did nothing.
import csv
with open('smallfriends.csv','rU') as csvfile:
readit = csv.reader(csvfile,delimiter=',')
for line in readit:
print line
for line in readit:
print 'foo'
Console output:
Austins-iMac:Desktop austin$ python -i amy.py
['Amy', 'James', 'Nathan', 'Sara', 'Kayley', 'Alexis']
['James', 'Nathan', 'Tristan', 'Miles', 'Amy', 'Dave']
['Nathan', 'Amy', 'James', 'Tristan', 'Will', 'Zoey']
['Kayley', 'Amy', 'Alexis', 'Mikey', 'Sara', 'Baxter']
>>>
>>> readit
<_csv.reader object at 0x1023fa3d0>
>>>
So the second 'for' loop basically does nothing. I thought the csv.reader object is freed from memory after reading it once. This is not the case as it still stores its memory address. I found a post that mentions a similar issue. The reason they were given is because after the object is read, the pointer remains at the end of the memory address ready to write data to the object. It's right? Can anyone elaborate on what's going on here in more detail? Is there a way to move the pointer back to the beginning of the memory address to re-read it? I know this is a bad way of coding, but I'm mostly just curious and want to know more about what's going on under the hood of Python.
Thank!
source to share
I'll try to answer your other questions about what the reader is doing and why reset()
or seek(0)
might help. In its most basic form, a csv reader might look something like this:
def csv_reader(it):
for line in it:
yield line.strip().split(',')
That is, it takes any iterator that produces strings and gives you a generator. All it does is take an element from your iterator, process it, and return the element. When it
consumed, the csv_reader will close. The reader doesn't know where the iterator is from or how to properly make a new one, so he doesn't even try to reset it himself. This is left for the programmer.
We can either change the iterator into place without knowing the reader, or just make a new reader. Here are some examples to demonstrate your point.
data = open('data.csv', 'r')
reader = csv.reader(data)
print(next(reader)) # Parse the first line
[next(data) for _ in range(5)] # Skip the next 5 lines on the underlying iterator
print(next(reader)) # This will be the 7'th line in data
print(reader.line_num) # reader thinks this is the 2nd line
data.seek(0) # Go back to the beginning of the file
print(next(reader)) # gives first line again
data = ['1,2,3', '4,5,6', '7,8,9']
reader = csv.reader(data) # works fine on lists of strings too
print(next(reader)) # ['1', '2', '3']
In general, if you need a second pass, your best bet is to close / reopen the files and use the new csv reader. Its clean and provides good bookkeeping.
source to share
Iterating through the csvreader just ends up iterating over the lines in the underlying file object. At each iteration, the reader gets the next line from the file, converts it, and returns it.
Thus, iterating over with csvreader follows the same conventions as iterating over files . That is, once the file reaches its end, you have to search for it before repeating it a second time.
The following should be done, although I haven't tested it:
import csv
with open('smallfriends.csv','rU') as csvfile:
readit = csv.reader(csvfile,delimiter=',')
for line in readit:
print line
# go back to the start of the file
csvfile.seek(0)
for line in readit:
print 'foo
source to share