Count substrings with fewer tails from a coin string

Question

Count substrings with fewer tails from a coin string

Given a sequence of heads and tails, I want to count the number of significant substrings that have at least as many heads as the number of tails. I want to achieve this in O (NlogN) time.

Input example: [ 'H', 'T', 'H', 'T', 'T', 'H' ]

Output example:

Explanation:

{H} {H} {H}
{H, T} {T, H} {H, T} {T, H}
{H, T, H} 
{H, T, H, T} 
{H, T, T, H} 
{H, T, H, T, T, H}

I believe my current algorithm is O (N ^ 2). I'll solve the problem recursively by iterating over the list of coins sliced at both ends.

Here is my current algorithm. How can I achieve O (NLogN) time?

def count_sequences( data ):
    print go(range(0,len(data)),data)
seen = set()
def go(rang,data):
    if tuple(rang) in seen: return 0
    seen.add(tuple(rang))
    h = 0
    summ = 0
    if len(rang)==0: return 0
    for i in rang:
        if data[i] == 'H': h += 1
        summ += go(rang[1:],data)
        summ += go(rang[:-1],data)
    if len(rang) == 1: 
        if h ==1: return 1
        else: return 0
    if h > (len(rang)-1)/2 : 
        return 1 + summ
    else: return summ

+3

python algorithm

kilojoules 11 jan. '15 at 6:46

source to share

1 answer

Juan Lopes · Accepted Answer · 2015-08-19T01:39:37+0000

Here's an O (n) solution.

Imagine instead of H and T, you had 1 and -1 in your array. This reduces the problem to calculating the number of non-negative total subarrays. This is a known problem and can be solved by calculating the accumulated array and finding the number of inversions.

This can be computed naively in O (n ^ 2) by looking for pairs i <j where A [i]> A [j]. It can be optimized for O (n log n) using the merge sort variant.

But in this case, the values in the array can be at most n, and consecutive values have an absolute difference of exactly 1, so we can create an algorithm that calculates these inversions "on the fly" in O (n):

def solve(A):
    count = 0
    accum = 0
    total = 0
    seen = {0: 1}

    for i, element in enumerate(A):
        if element == 'H': 
            count += 1
            accum -= seen.get(count, 0)
        else:
            accum += seen.get(count, 0)
            count -= 1

        seen[count] = seen.get(count, 0) + 1
        total += (i + 1 - accum)

    return total

print solve([ 'H', 'T', 'H', 'T', 'T', 'H' ])

print solve([ 'T', 'H', 'H', 'H', 'T' ])

This algorithm is largely based on what I read here .

Count substrings with fewer tails from a coin string

More articles: