Python speed difference

I recently tried to fix this issue on Hackerrank. I got the following solution, which gives the correct answer within a given time frame:

from collections import Counter
lisa=[]
for each in range(input()):
    current=raw_input()
    count=0
    lookup_dict={}
    i=0
    for i in range(len(current)):
        for j in range(i,len(current)):
            sub_string=current[i:j+1]
            sub_string=''.join(sorted(sub_string))
            if sub_string in lookup_dict.keys():
                lookup_dict[sub_string]+=1
            else:
                lookup_dict[sub_string]=1

    for k in lookup_dict.values():
        count+=k*(k-1)/2
    print count

      

My problem is that the solution they provided (reproduced below) is significantly faster than the one I wrote, even though the complexity and method used are the same.

from collections import *
for i in range(input()):
    s = raw_input()
    check = defaultdict(int)
    l = len(s)
    for i in range(l):
        for j in range(i+1,l+1):
            sub = list(s[i:j])
            sub.sort()
            sub = "".join(sub)
            check[sub]+=1
    sum = 0
    for i in check:
        n = check[i]
        sum += (n*(n-1))/2
    print sum

      

I'm guessing it has something to do with the method defaultdict

, but can't figure out why?

+3


source to share


2 answers


In Python 2.x, dict.keys()

returns a list, and in your first solution, you actually do -

if sub_string in lookup_dict.keys()

      

It would be an O (n) operation (n is the size of the dictonary - lookup_dict

), since it .keys()

actually returns the list of list and containment checks in the O (n) list, and also, most likely, there is an added cost to create the list.

Whereas in the second approach you don't have such a check, and the defaultdict handles the default automatically for you, and this may be one of the reasons why your first solution is significantly slower than the second (given that you are searching dict.keys()

in the innermost loop, so the search happens so many times).




An example of displaying the returned list dict.keys()

is -

>>> d = {1:2,3:4,5:6,7:8}
>>> d.keys()
[1, 3, 5, 7]

      

+4


source


Speaking of defaultdict

: it's a bit optimized compared to simple key validation. I.e:.

x = defaultdict(int)
for i in xrange(...):
    x[i] += 1

      

performs ~ 50% faster than



x = {}
for i in xrange(...):
    if i in x:
         x[i] +=1
    else:
        x[1] = 1

      

if the all key is missing .

But the main case is that the call dict.keys()

in py2 actually creates a list. Therefore, to check, key in dict.keys()

you must first allocate memory for the list and then fill it with the actual key values, and then check your key against it. In the worst case, after that, this list must be cleaned up by the garbage collector, and in the next step for

you will do the same, except that more memory will be allocated for this list. So it actually kills performance in your example.

+1


source







All Articles