Understanding the len function with iterators

While reading the documentation, I noticed that the built-in function len

doesn't support all iterations, but just sequences and displays (and sets). Before reading this, I always thought that the function was len

using an iteration protocol to estimate the length of an object, so I was very surprised to see this.

I have read the questions already posted ( here and here ) but I am still confused, I still don’t get a real reason why not let it len

work with all iterations in general.

Is this a more conceptual / logical reason than implementation? I mean, when I ask about the length of an object, I ask for one property (how many elements are there), a property that objects as generators do not have, because they have no elements inside, elements of the work.

In addition, generator objects can result in infinite elements carrying undefined length, which cannot happen with other objects in the form of lists, tuples, dicts, etc.

So, am I right, or are there more ideas / something more that I am not considering?

+3


source to share


1 answer


The biggest reason is that it reduces type safety.

How many programs have you written where you really needed to use an iterable to know how many elements it had, throwing away anything else?

I've, for years coding in Python, never really needed this. This is an insensitive operation in normal programs. An iterator may not have a length (like infinite iterators or generators waiting for input through send()

), so asking for it doesn't make much sense. The fact that it len(an_iterator)

creates an error means that you can find errors in your code. You can see that in a certain part of the program you are calling len

for the wrong reason, or maybe your function really needs a sequence instead of an iterator as you expected.

Removing such errors will create a new class of errors where the callers len

erroneously consume the iterator or use the iterator as if it were a sequence that doesn't understand.

If you really need to know the length of the iterator, what's wrong with len(list(iterator))

? Additional 6 characters? It's trivial to write your own version that works for iterators, but as I said, 99% of the time it just means something is wrong with your code, because it doesn't make much sense.

The second reason is that with this change, you are breaking two nice properties len

that are currently held for all (known) containers:

  • It is known that it is cheap on all containers ever implemented in Python (all built-in, standard library, numpy

    and scipy

    all other major third-party libraries for it as the containers with dynamic size, and size for static). So when you see len(something)

    , you know that calling is len

    cheap. Setting it to work with iterators would mean that all of a sudden all programs could become inefficient due to length calculations.

    , O (1) __len__

    . , , . - , , ( ). , , , O (n).

    In short: everyone is currently implementing __len__

    in O (1) and it's easy to keep doing it. Thus, it is expected to len

    be O (1) for calls . Even if it is not part of the standard. Python developers deliberately avoid C / C ++ style legalese in their documentation and trust users. In this case, if yours is __len__

    not O (1), it expected you to document it.

  • It is known to be non-destructive. Any reasonable implementation __len__

    does not change its argument. Therefore, you can be sure that len(x) == len(x)

    or n = len(x);len(list(x)) == n

    .

    Even this property is not defined in the documentation, however it is expected by everyone and currently no one violates it.



These properties are good because you can reason and make assumptions about how to use their code. They can help you ensure that your code is correct or understand its asymptotic complexity. The change you propose would make it a lot harder to look at some code and see if it will be fixed or what will be difficult because you have to keep special cases in mind.

All in all, the change you are proposing has one really small pro: saving multiple characters in special situations, but it has a few big flaws that can affect a huge chunk of existing code.


Another minor reason. If len

consumes iterators, I'm sure some people will start to abuse this for their side effects (replacing the already ugly use of map

or concept lists). Suddenly people can write code like:

len(print(something) for ... in ...)

      

to print text that is really just ugly. It doesn't read well. Configured code should be referred to as assertions because they provide visual control of side effects.

+7


source







All Articles