Why is chaining iterables difficult? Simplify this code

I want to chain multiple iterations, all with lazy evaluation (speed is critical) to do the following:

  • read many integers from one huge stdin line
  • split () that string
  • convert the resulting strings to int
  • calculate the difference between consecutive ints
  • ... and some other things not shown here.

The real example is more complex, here's a simplified example:

Here's an example of a stdin line: 2 13 4 16 16 15 22 17 8 8 7 6

(For debugging purposes, instream

below may point to sys.stdin or open file descriptor)

You cannot just generate the chain, as it map()

returns a (lazily-rated) list:

import itertools
gen1 = map(int, (map(str.split, instream))) # CAN'T CHAIN DIRECTLY

      

The hardest working solution I've found is, can't it be simplified?

gen1 = map(int, itertools.chain.from_iterable(itertools.chain(map(str.split, instream))))

      

Why the hell do I need a chain itertools.chain.from_iterable(itertools.chain

to handle the result from map(str.split, instream)

- is this kind of a defeat for the target? Is manually identifying my generators faster?

+3


source to share


2 answers


An explicit ("manual") generator expression should be preferred using map

and filter

. It is more readable for most people and more flexible.

If I understand your question, this generator expression does what you need:



gen1 = ( int(x) for line in instream for x in line.split() )

      

+2


source


You can create your generator manually:

import string

def gen1(stream):
    # presuming that stream is of type io.TextIOBase


    s = ""
    c = stream.read(1)  
    while len(c)>0:

        if (c not in string.digits):
            if len(s) > 0:
                i = int(s)
                yield i
                s = ""
        else:
            s += c

        c = stream.read(1)

    if len(s) > 0:
        i = int(s)
        yield i 


import io
g = gen1(io.StringIO("12 45  6 7 88"))
for x in g:    # dangerous if stream is unlimited
    print(x)

      

This is certainly not the prettiest code, but it does what you want. Explanations:

If your input is infinitely long, you should read it in chunks (or character wise). Whenever you come across a non-digit (spaces), you are converting the characters you read until that point becomes an integer and gives it. You should also consider what happens when you reach EOF. My implementation is probably not well done, due to the fact that I am reading char -wise. By using chunks, you can speed it up a lot.

EDIT why your approach will never work:

map(str.split, instream)

      



just doesn't do what it seems to think it does. map

applies the specified function str.split

to each element of the iterator specified as the second parameter. In your case, it is a stream, that is, a file object, in the case of sys.stdin, in particular, the io.TextIOBase object. Which can really be repeated. Line by line, which is emphatically NOT what you want! In fact, you are iterating over the string line by line and breaking each line into words. The map generator iterates over (many) wordlists NOT over the wordlist. This is why you need to link them together to get one repeat list.

Moreover, itertools.chain()

in itertools.chain.from_iterable(itertools.chain(map(...)))

is redundant. itertools.chain

concatenates its arguments (each non-negated object) together into a single iterator. You only give one argument so there is nothing to concatenate, it basically returns the map object unchanged. itertools.chain.from_iterable()

, on the other hand, takes one argument, which is expected to be an iterator of iterators (like a list of lists) and flattens it into one iterator (list).

EDIT2

import io, itertools

instream = io.StringIO("12 45 \n 66 7 88")
gen1 = itertools.chain.from_iterable(map(str.split, instream))
gen2 = map(int, gen1)
list(gen2)

      

returns

[12, 45, 66, 7, 88]

      

0


source







All Articles