Reverse slice from end of list to specific index

Let's say I want a snippet from the end of a sequence seq

to the first occurrence of a given element x

(inclusive). A naive attempt to write seq[-1:seq.index(x)-1:-1]

creates a subtle error:

seq = 'abc'
seq[-1:seq.index('b')-1:-1]  # 'cb' as expected
seq[-1:seq.index('a')-1:-1]  # '' because -1 is interpreted as end of seq

      

Is there any idiomatic way to write this?

seq[seq.index(x):][::-1]

works great, but it appears to be inefficient for large sequences as it creates an extra copy. (I want a sequence at the end, so I need one copy; I just don't want to create a second copy.)

On the side of the note, this is a very simple mistake to introduce, it can pass a lot of tests and defy definition for any static analyzer (unless it warns about every negative step).

Update

There seems to be no perfect / idiomatic solution. I agree that this may not be the bottleneck as often as I thought, which is why I use it [pos:][::-1]

most of the time. When performance is important, I would use normal validation if

. However, I will accept a solution that I found interesting, although difficult to read; it is probably applicable in some rare cases (where I really need to fit the whole thing into an expression and I don't want to define a new function).

Also, I have tried this time. For lists, there always seems to be a 2x penalty for an extra chunk, even if it's less than two items. For strings, the results are extremely inconsistent, to the point where I can't say anything:

import timeit
for n in (2, 5, 10, 100, 1000, 10000, 100000, 1000000):
    c = list(range(n))
    # c = 'x' * n
    pos = n // 2 # pretend the item was found in the middle
    exprs = 'c[pos:][::-1]', 'c[:pos:-1] if pos else c[::-1]'
    results = [timeit.Timer(expr, globals=globals()).autorange() for expr in exprs]
    times = [t/loops for loops, t in results]
    print(n, times[0]/times[1])

      

Results for lists (ratio of extra slice / no extra slice times):

2 2.667782437753884
5 2.2672817613246914
10 1.4275235266754878
100 1.6167102119737584
1000 1.7309116253903338
10000 3.606259720606781
100000 2.636049703318956
1000000 1.9915776615090277

      

Of course, this ignores the fact that whatever we do with the resulting slice is much more expensive, in relative terms, when the slice is short. However, I agree that it is [::-1]

usually fine for small sequences .

+3


source to share


3 answers


If the result of the iterator is ok, use a front slice and call reversed

on it:

reversed(seq[seq.index(whatever):])

      

If not, subtract the extra len(seq)

from the end point:



seq[:seq.index(whatever)-len(seq)-1:-1]

      

Or just grab the front cut, slice it again to undo it, and eat for an extra copy. This is probably not your bottleneck.

Whatever you do, leave a comment explaining it so people don't return an error when editing, and write a unit test for that case.

+3


source


IMHO, this seq[seq.index(x):][::-1]

is the most readable solution, but here is a way that is slightly more efficient.

def sliceback(seq, key):
    pos = seq.index(key)
    return seq[:pos-1 if pos else None:-1]

seq = 'abc'
for k in seq:
    print(k, sliceback(seq, k)) 

      

Output

a cba
b cb
c c

      


As Budo Zindovic mentions in the comments, .index

will throw an exception if char is not found in the string. Depending on the context, the code may never be called with a char that is not in seq

, but if possible, we need to process it. The easiest way to do this is to catch the exception:



def sliceback(seq, key):
    try:
        pos = seq.index(key)
    except ValueError:
        return ''
    return seq[:pos-1 if pos else None:-1]

seq = 'abc'
for k in 'abcd':
    print(k, sliceback(seq, k)) 

      

Output

a cba
b cb
c c
d 

      

Python exception handling is very efficient. When the exception does not actually rise, it is faster than the equivalent code if

, but if the exception grows more than 5-10% of the time, it uses faster if

.

Instead of testing key

before calling seq.index

, it's more efficient to use find

. Of course, this will only work if seq

is a string; it won't work if it's seq

a list, because (annoyingly) lists don't have a method .find

.

def sliceback(seq, key):
    pos = seq.find(key)
    return '' if pos < 0 else seq[:pos-1 if pos else None:-1]

      

+3


source


You can check pos

when assigning a string, for example:

result = seq[-1:pos-1:-1] if pos > 0 else seq[::-1]

      

input:

pos = seq.index('a')

      

output:

cba

      

0


source







All Articles