Count substrings with fewer tails from a coin string
Given a sequence of heads and tails, I want to count the number of significant substrings that have at least as many heads as the number of tails. I want to achieve this in O (NlogN) time.
Input example:
[ 'H', 'T', 'H', 'T', 'T', 'H' ]
Output example:
11
Explanation:
{H} {H} {H} {H, T} {T, H} {H, T} {T, H} {H, T, H} {H, T, H, T} {H, T, T, H} {H, T, H, T, T, H}
I believe my current algorithm is O (N ^ 2). I'll solve the problem recursively by iterating over the list of coins sliced ββat both ends.
Here is my current algorithm. How can I achieve O (NLogN) time?
def count_sequences( data ):
print go(range(0,len(data)),data)
seen = set()
def go(rang,data):
if tuple(rang) in seen: return 0
seen.add(tuple(rang))
h = 0
summ = 0
if len(rang)==0: return 0
for i in rang:
if data[i] == 'H': h += 1
summ += go(rang[1:],data)
summ += go(rang[:-1],data)
if len(rang) == 1:
if h ==1: return 1
else: return 0
if h > (len(rang)-1)/2 :
return 1 + summ
else: return summ
source to share
Here's an O (n) solution.
Imagine instead of H and T, you had 1 and -1 in your array. This reduces the problem to calculating the number of non-negative total subarrays. This is a known problem and can be solved by calculating the accumulated array and finding the number of inversions.
This can be computed naively in O (n ^ 2) by looking for pairs i <j where A [i]> A [j]. It can be optimized for O (n log n) using the merge sort variant.
But in this case, the values ββin the array can be at most n, and consecutive values ββhave an absolute difference of exactly 1, so we can create an algorithm that calculates these inversions "on the fly" in O (n):
def solve(A):
count = 0
accum = 0
total = 0
seen = {0: 1}
for i, element in enumerate(A):
if element == 'H':
count += 1
accum -= seen.get(count, 0)
else:
accum += seen.get(count, 0)
count -= 1
seen[count] = seen.get(count, 0) + 1
total += (i + 1 - accum)
return total
print solve([ 'H', 'T', 'H', 'T', 'T', 'H' ])
print solve([ 'T', 'H', 'H', 'H', 'T' ])
This algorithm is largely based on what I read here .
source to share