Longest sequence of consecutive letters

Suppose I have a string of lowercase letters like

'ablccmdnneofffpg'

      

And my goal is to find the longest sequence of consecutive numbers within that string, which in this case:

'abcdefg'

      

An intuitive attempt is to find a loop around each letter and get the longest sequence starting with that letter. One possible solution is

longest_length = 0
start = None
current_start = 0
while current_start < len(word) - longest_length:
    current_length = 1
    last_in_sequence = ord(word[current_start])
    for i in range(current_start + 1, len(word)):
        if ord(word[i]) - last_in_sequence == 1:
            current_length += 1
            last_in_sequence = ord(word[i])
    if current_length > longest_length:
        longest_length = current_length
        start = current_start
    while (current_start < len(word) - 1 and
           ord(word[current_start + 1]) - ord(word[current_start]) == 1):
        current_start += 1
    current_start += 1

      

Are there other ways to solve the problem with fewer lines or even using some pythonic methods?

+3


source to share


4 answers


You can keep track of all subsequences of consecutive characters as shown in a string using a dictionary and then take the one that is the longest.

Each subsequence is entered by the next candidate in the alphabet, so that once the expected candidate is reached in the string, it is used to update the value of the corresponding subsequence in the dictionary and add the following alphabet as the new dictionary value:



def longest_sequence(s):
    d = {}
    for x in s:
       if x in d:
           d[chr(ord(x)+1)] = d[x] + x
       else:
           d[chr(ord(x)+1)] = x
    return max(d.values(), key=len)

print(longest_sequence('ablccmdnneofffpg'))
# abcdefg
print(longest_sequence('ba'))
# b
print(longest_sequence('sblccmtdnneofffpgtuyvgmmwwwtxjyuuz'))
# stuvwxyz

      

+6


source


A solution that trades memory for (some) time:

It keeps track of all the sequences it sees, and then prints the longest ones found (although there may be several) at the end.



from contextlib import suppress


class Sequence:
    def __init__(self, letters=''):
        self.letters = letters
        self.last = self._next_letter(letters[-1:])

    def append(self, letter):
        self.letters += letter
        self.last = self._next_letter(letter)

    def _next_letter(self, letter):
        with suppress(TypeError):
            return chr(ord(letter) + 1)
        return 'a'

    def __repr__(self):
        return 'Sequence({}, {})'.format(repr(self.letters),
                                         repr(self.last))


word = 'ablccmdnneofffpg'
sequences = []
for letter in word:
    for s in sequences:
        if s.last == letter:
            s.append(letter)
            break
    else:
        sequences.append(Sequence(letters=letter))

sequences = list(sorted(sequences, key=lambda s: len(s.letters), reverse=True))
print(sequences[0].letters)

      

+1


source


Basically you are asking longest increasing subsequence

which is a well researched problem. Check out the pseudocode on Wikipedia.

0


source


Similar to Moses Koledoi's solution , but only stores longitudes for the ordinals of characters and only builds the solution string at the end. So this should be a little more economical:

def longest_seq(s):
  d = {}
  for c in s:
    c, prev_c = ord(c), ord(c) - 1
    d[c] = max(d.get(c, 0), d.pop(prev_c, 0) + 1)
  c, l = max(d.items(), key=lambda i: i[1])
  return ''.join(map(chr, range(c-l+1, c+1)))

      

0


source







All Articles