Undo the last iteration of a line in a file

I need to iterate over a file, stop iterating on a condition, and then continue parsing the file on the same line with another function (this might change, so I can't just add content to the previous function).

Example file (file.txt):

1
2
3
4
5
6
7
8
9

      

The function I'm trying to do:

def parse1(file, stop):
# 1st parsing function (Main function I am doing)
    for line in file:
            if line.strip() == stop:
            # Stop parsing on condition
                break
            else:
            # Parse the line (just print for example)
                print(line)

def parse2(file):
# 2nd parsing function (Will be my own functions or external functions)
    for line in file:
        # Parse the line (just print for example)
        print(line)

      

Result in terminal:

>>> file = open("file.txt")

>>> parse1(file, "4")
1
2
3

>>> parse2(file)
5
6
7
8
9

      

My problem is that line "4" is being skipped by the first function when I search for a condition.

How can I avoid this: I found a solution to undo the last iteration or return a row.

The function file.tell()

does not work with for

on file.

I tried to do it with while

+ file.readline()

, but it is very slower than looping for

in a file (and I want to parse files with millions of lines).

Is there an elegant solution to maintain the loop for

?

+3


source to share


3 answers


In python3, the constructor "for line in file" is represented by an internal iterator. By definition, a value that was created from an iterator cannot be "returned" for later use ( http://www.diveintopython3.net/iterators.html ).

To get the behavior you want, you need a function that concatenates two iterators, such as the function chain

provided itertools

. In a stopped state, parse1

you return the last line along with the file iterator:

import itertools

def parse1(file,stop):
# 1st parsing function
    for line in file:
       # Stop parsing on condition
        if line.strip() == stop:
            return itertools.chain([line],file) # important line
        else:
        # Parse the line (just print for example)
            print('parse1: '+line)

      

The chain operator links two iterators. The first iterator contains only one element: the line that you want to process again. The second iterator is the rest of the file. Once the first iterator ends, the second iterator is accessed.

You don't need to change parse2

. For clarity, I changed the print statement:

def parse2(file):
# 2nd parsing function
for line in file:
    # Parse the line (just print for example)
    print('parse2: '+line)

      



Then you can call parse1 and parse2 in the most functional way:

with open('testfile','r') as infile:
   parse2(parse1(infile,'4'))

      

The output of the line above is:

parse1: 1
parse1: 2
parse1: 3
parse2: 4
parse2: 5
parse2: 6
parse2: 7
parse2: 8
parse2: 9

      

Notice how the value '4' was generated by the function parse2

.

+2


source


I suggest making copy 1 of your file object and just looping over the copy in the block else

and calling the second function in the first function, as well as the more pythonic way you can use with

to open the file, which will close the file at the end of the statement and put the second function in the first function:

#ex.txt

1
2
3
4
5
6
7
8
9
10

      

you can use itertools.tee

to create a copy of 1 of your file object:

from itertools import tee

def parse1(file_name, stop):

  def parse2(file_obj):
    print '**********'
    for line in file_obj:
        print(line)

  with open(file_name) as file_obj:
    temp,file_obj=tee(file_obj)
    for line in temp:
            if line.strip() == stop:
                break
            else:
                next(file_obj)
                print(line)
    parse2(file_obj)

parse1("ex.txt",'4')

      



result:

1

2

3

**********
4

5

6

7

8

9

10

      


<sub> 1) does itertools.tee

n't actually create a copy, but you can use it for this purpose based on the DOC. Returns n independent iterators from one iterable. and you can assign one of these independent iterators to the object itself that has been iterated over and create each other as temp. Sub>

+1


source


IMHO, the simplest solution is for the first parser to return the string where it found the stop condition and pass it on to the second. The second one should have an explicit function to parse one line to avoid code duplication:

def parse1(file, stop):
# 1st parsing function (Main function I am doing)
    for line in file:
            if line.strip() == stop:
            # Stop parsing on condition
                return line
            else:
            # Parse the line (just print for example)
                print(line)
    return None

def parse2(file, line = None):
# 2nd parsing function (Will be my own functions or external functions)
    def doParse(line):
    # do actual parsing (just print for example)
        print(line)
    if line is None:
        doParse(line)
    for line in file:
        doParse(line)

# main
...
stop = parse1(file)
if stop:
    parse2(stop, file)

      

0


source







All Articles