Count substrings with fewer tails from a coin string

Given a sequence of heads and tails, I want to count the number of significant substrings that have at least as many heads as the number of tails. I want to achieve this in O (NlogN) time.

Input example: [ 'H', 'T', 'H', 'T', 'T', 'H' ]

Output example:

11

      

Explanation:

{H} {H} {H}
{H, T} {T, H} {H, T} {T, H}
{H, T, H} 
{H, T, H, T} 
{H, T, T, H} 
{H, T, H, T, T, H} 

      

I believe my current algorithm is O (N ^ 2). I'll solve the problem recursively by iterating over the list of coins sliced ​​at both ends.

Here is my current algorithm. How can I achieve O (NLogN) time?

def count_sequences( data ):
    print go(range(0,len(data)),data)
seen = set()
def go(rang,data):
    if tuple(rang) in seen: return 0
    seen.add(tuple(rang))
    h = 0
    summ = 0
    if len(rang)==0: return 0
    for i in rang:
        if data[i] == 'H': h += 1
        summ += go(rang[1:],data)
        summ += go(rang[:-1],data)
    if len(rang) == 1: 
        if h ==1: return 1
        else: return 0
    if h > (len(rang)-1)/2 : 
        return 1 + summ
    else: return summ

      

+3


source to share


1 answer


Here's an O (n) solution.

Imagine instead of H and T, you had 1 and -1 in your array. This reduces the problem to calculating the number of non-negative total subarrays. This is a known problem and can be solved by calculating the accumulated array and finding the number of inversions.

This can be computed naively in O (n ^ 2) by looking for pairs i <j where A [i]> A [j]. It can be optimized for O (n log n) using the merge sort variant.



But in this case, the values ​​in the array can be at most n, and consecutive values ​​have an absolute difference of exactly 1, so we can create an algorithm that calculates these inversions "on the fly" in O (n):

def solve(A):
    count = 0
    accum = 0
    total = 0
    seen = {0: 1}

    for i, element in enumerate(A):
        if element == 'H': 
            count += 1
            accum -= seen.get(count, 0)
        else:
            accum += seen.get(count, 0)
            count -= 1

        seen[count] = seen.get(count, 0) + 1
        total += (i + 1 - accum)

    return total

print solve([ 'H', 'T', 'H', 'T', 'T', 'H' ])

print solve([ 'T', 'H', 'H', 'H', 'T' ])

      

This algorithm is largely based on what I read here .

+2


source







All Articles