Undo the last iteration of a line in a file
I need to iterate over a file, stop iterating on a condition, and then continue parsing the file on the same line with another function (this might change, so I can't just add content to the previous function).
Example file (file.txt):
1 2 3 4 5 6 7 8 9
The function I'm trying to do:
def parse1(file, stop):
# 1st parsing function (Main function I am doing)
for line in file:
if line.strip() == stop:
# Stop parsing on condition
break
else:
# Parse the line (just print for example)
print(line)
def parse2(file):
# 2nd parsing function (Will be my own functions or external functions)
for line in file:
# Parse the line (just print for example)
print(line)
Result in terminal:
>>> file = open("file.txt")
>>> parse1(file, "4")
1
2
3
>>> parse2(file)
5
6
7
8
9
My problem is that line "4" is being skipped by the first function when I search for a condition.
How can I avoid this: I found a solution to undo the last iteration or return a row.
The function file.tell()
does not work with for
on file.
I tried to do it with while
+ file.readline()
, but it is very slower than looping for
in a file (and I want to parse files with millions of lines).
Is there an elegant solution to maintain the loop for
?
In python3, the constructor "for line in file" is represented by an internal iterator. By definition, a value that was created from an iterator cannot be "returned" for later use ( http://www.diveintopython3.net/iterators.html ).
To get the behavior you want, you need a function that concatenates two iterators, such as the function chain
provided itertools
. In a stopped state, parse1
you return the last line along with the file iterator:
import itertools
def parse1(file,stop):
# 1st parsing function
for line in file:
# Stop parsing on condition
if line.strip() == stop:
return itertools.chain([line],file) # important line
else:
# Parse the line (just print for example)
print('parse1: '+line)
The chain operator links two iterators. The first iterator contains only one element: the line that you want to process again. The second iterator is the rest of the file. Once the first iterator ends, the second iterator is accessed.
You don't need to change parse2
. For clarity, I changed the print statement:
def parse2(file):
# 2nd parsing function
for line in file:
# Parse the line (just print for example)
print('parse2: '+line)
Then you can call parse1 and parse2 in the most functional way:
with open('testfile','r') as infile:
parse2(parse1(infile,'4'))
The output of the line above is:
parse1: 1
parse1: 2
parse1: 3
parse2: 4
parse2: 5
parse2: 6
parse2: 7
parse2: 8
parse2: 9
Notice how the value '4' was generated by the function parse2
.
source to share
I suggest making copy 1 of your file object and just looping over the copy in the block else
and calling the second function in the first function, as well as the more pythonic way you can use with
to open the file, which will close the file at the end of the statement and put the second function in the first function:
#ex.txt
1
2
3
4
5
6
7
8
9
10
you can use itertools.tee
to create a copy of 1 of your file object:
from itertools import tee
def parse1(file_name, stop):
def parse2(file_obj):
print '**********'
for line in file_obj:
print(line)
with open(file_name) as file_obj:
temp,file_obj=tee(file_obj)
for line in temp:
if line.strip() == stop:
break
else:
next(file_obj)
print(line)
parse2(file_obj)
parse1("ex.txt",'4')
result:
1
2
3
**********
4
5
6
7
8
9
10
<sub> 1) does itertools.tee
n't actually create a copy, but you can use it for this purpose based on the DOC. Returns n independent iterators from one iterable. and you can assign one of these independent iterators to the object itself that has been iterated over and create each other as temp. Sub>
source to share
IMHO, the simplest solution is for the first parser to return the string where it found the stop condition and pass it on to the second. The second one should have an explicit function to parse one line to avoid code duplication:
def parse1(file, stop):
# 1st parsing function (Main function I am doing)
for line in file:
if line.strip() == stop:
# Stop parsing on condition
return line
else:
# Parse the line (just print for example)
print(line)
return None
def parse2(file, line = None):
# 2nd parsing function (Will be my own functions or external functions)
def doParse(line):
# do actual parsing (just print for example)
print(line)
if line is None:
doParse(line)
for line in file:
doParse(line)
# main
...
stop = parse1(file)
if stop:
parse2(stop, file)
source to share