Optimize the permutation search loop (can't use itertools) which is very slow. Any suggestions?

This is a game in which you have 12 cards and you pick you until you pick 3 from the same group. I try to find the probability of choosing each group. The script I created works, but it is very slow. A colleague of mine has created a similar script in R with no functions, and his script takes 1 / 100th the time it takes mine. I'm just trying to figure out why. Any ideas are greatly appreciated.

from collections import Counter
import pandas as pd
from datetime import datetime

weight = pd.read_excel('V01Weights.xlsx')

      

The weight looks like this:

Symb    Weight
Grand   170000
Grand   170000
Grand   105
Major   170000
Major   170000
Major   215
Minor   150000
Minor   150000
Minor   12000
Bonus   105000
Bonus   105000
Bonus   105000

      

Max Picks represents the total number of different "cards". Total Picks represents the maximum number of custom picks. This is because after 8 variants you are guaranteed to have 2 of each type, so at the 9th plant you are guaranteed 3 matches.

TotalPicks = 9
MaxPicks = 12

      

This should have been named PickedProbabilities.

Picks = {0:0,1:0,2:0,3:0}

      

This is my simple version of the timeit class because I don't like the timeit class

def Time_It(function):
    start =datetime.now()

    x = function()

    finish = datetime.now()

    TotalTime = finish - start

    Minutes = int(TotalTime.seconds/60)
    Seconds = TotalTime.seconds % 60


    print('It took ' + str(Minutes) + ' minutes and ' + str(Seconds) + ' seconds')

    return(x)

      

Given x (my picks are ok) I find the probability. These selections are performed without replacement

def Get_Prob(x,weight):
    prob = 1
    weights = weight.iloc[:,1]

    for index in x:
        num = weights[index]
        denom = sum(weights)

        prob *= num/denom
        weights.drop(index, inplace = True)
        # print(weights)

    return(prob)

      

This is used to detect if there are duplicates in my loop because it is not valid

def Is_Allowed(x):
    return(len(x) == len(set(x)))

      

This determines whether there is a win in all currently available cards.

def Is_Win(x):
    global Picks

    WinTypes = [[0,1,2],[3,4,5],[6,7,8],[9,10,11]]

    IsWin = False

    for index,item in enumerate(WinTypes):
        # print(index)
        if set(item).issubset(set(x)):
            IsWin = True
            Picks[index] += Get_Prob(x,weight)
            # print(Picks[index])
            print(sum(Picks.values()))
            break

    return(IsWin)

      

This is my main function that runs through all the cards. I tried to do it using recursion, but I eventually gave up. I cannot use itertools to create all the permutations because eg [0,1,2,3,4] will be created by itertools, but that is not possible because as soon as you get 3 matches, the game is over.

def Cycle():

    for a in range(MaxPicks):
        x = [a]

        for b in range(MaxPicks):
            x = [a,b]

            if Is_Allowed(x):
                for c in range(MaxPicks):
                    x = [a,b,c]
                    if Is_Allowed(x):                    
                        if Is_Win(x):
                            # print(x)
                            continue
                        for d in range(MaxPicks):
                            x = [a,b,c,d]
                            if Is_Allowed(x):
                                if Is_Win(x):
                                    # print(x)
                                    continue
                            for e in range(MaxPicks):
                                x = [a,b,c,d,e]
                                if Is_Allowed(x):
                                    if Is_Win(x):
                                        continue
                                for f in range(MaxPicks):
                                    x = [a,b,c,d,e,f]
                                    if Is_Allowed(x):
                                        if Is_Win(x):
                                            continue
                                    for g in range(MaxPicks):
                                        x = [a,b,c,d,e,f,g]
                                        if Is_Allowed(x):
                                            if Is_Win(x):
                                                continue
                                        for h in range(MaxPicks):
                                            x = [a,b,c,d,e,f,g,h]
                                            if Is_Allowed(x):
                                                if Is_Win(x):
                                                    continue
                                            for i in range(MaxPicks):
                                                if Is_Allowed(x):
                                                    if Is_Win(x):
                                                        continue

      

Calls the main function

x = Time_It(Cycle)
print(x)

      

writes probabilities to a text file

with open('result.txt','w') as file:
    # file.write(pickle.dumps(x))
    for item in x:
        file.write(str(item) + ',' + str(x[item]) + '\n')

      

+3


source to share


3 answers


Ok this time I hope I understood your problem correctly :)

There are two understandings (I think you use them, just for completeness) required to speed up your program algorithmically:

  • The probabilities for the sequence (card_1, card_2)

    and are (card_2, card_1)

    not equal, so we cannot use the results from the urn problem, and it looks like we need to try all permutations.
  • However, given the set of cards that we have selected so far, we do not need information in what order they are selected - this is still for the future course of the game. Therefore, it is sufficient to use dynamic programming and calculate the probabilities for each subset that will go through during the game (so we need to check 2^N

    instead of states N!

    ).

For a set of selected cards, the set

probability of choosing a card i

in the next turn:

 norm:=sum Wi for i in set 
 P(i|set)=Wi/norm if i not in set else 0.0

      

Recursion for calculation P(set)

- the probability that a map of collected cards appeared during the game:

set_without_i:=set/{i}
P(set)=sum P(set_without_i)*P(i|set_without_i) for i in set

      

However, this should only be done set_without_i

for which the game has not yet ended, i.e. no group chose 3 cards.

This can be done with recursion + memoization or, like my version, using dynamic bottom-up programming. It also uses binary integer representation for set representations and (the most important part!) Returns the result almost instantly [('Grand', 0.0014104762718021384), ('Major', 0.0028878988709489244), ('Minor', 0.15321793072867956), ('Bonus', 0.84248369412856905)]

:

#calculates probability to end the game with 3 cards of a type


N=12

#set representation int->list
def decode_set(encoded):
    decoded=[False]*N
    for i in xrange(N):
        if encoded&(1<<i):
            decoded[i]=True
    return decoded

weights = [170000, 170000, 105, 170000, 170000, 215, 150000, 150000, 12000, 105000, 105000, 105000]     
def get_probs(decoded_set):
    denom=float(sum((w for w,is_taken in zip(weights, decoded_set) if not is_taken)))
    return [w/denom if not is_taken else 0.0 for w,is_taken in zip(weights, decoded_set)]

def end_group(encoded_set):
    for i in xrange(4):
       whole_group =  7<<(3*i) #7=..000111, 56=00111000 and so on
       if (encoded_set & whole_group)==whole_group:
           return i
    return None


#MAIN: dynamic program:

MAX=(1<<N)#max possible set is 1<<N-1
probs=[0.0]*MAX

#we always start with the empty set:
probs[0]=1.0    
#building bottom-up
for current_set in xrange(MAX):
    if end_group(current_set) is None:  #game not ended yet!
       decoded_set=decode_set(current_set)
       trans_probs=get_probs(decoded_set)
       for i, is_set in enumerate(decoded_set):
           if not is_set:
              new_set=current_set | (1<<i) 
              probs[new_set]+=probs[current_set]*trans_probs[i]

#filtering wins:
group_probs=[0.0]*4
for current_set in xrange(MAX):
   group_won=end_group(current_set)
   if group_won is not None:
      group_probs[group_won]+=probs[current_set]


print zip(["Grand", "Major", "Minor", "Bonus"], group_probs)

      


Some explanation of the "tricks" used in the code:



A pretty standard trick is to use an integer binary representation to encode the set. Let's say that we have objects [a,b,c]

, so we could represent the set {b,c}

as 110

, which would mean a

(first in the list matches 0

- the lowest digit) - not in a set, b(1)

in a set, c(1)

in a set. However, it 110

is read as an integer 6

.

The current_set - for loop mimics a game and is best understood during a game. Let's play with two cards [a,b]

with weights [2,1]

.

We run the game with an empty set 0

as an integer, so the probability vector (a given set, its binary representation and an integer mapped to probability):

  probs=[{}=00=0->1.0,  01={a}=1->0.0, {b}=10=2->0.0, {a,b}=11=3->0.0]

      

Let 's process current_set=0

, two possibilities to take a card 66% and take a card a

33% b

, so the probabilities become after processing:

 probs=[{}=00=0->1.0,  01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.0]

      

Now we are processing current_set=1={a}

, the only option is to take b

, so we are done with the set {a,b}

. Therefore, we need to update its ( 3={a,b}

) probability by our formula and get:

 probs=[{}=00=0->1.0,  01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.66]

      

In the next step we process 2

, and if set {b} is given, the only option is to select a card a

, so we need to update the probability of a set again{a,b}

probs=[{}=00=0->1.0,  01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->1.0]

      

We can get to {a,b}

on two different paths - this can be seen in our algorithm. The likelihood of going through the set {a,b}

at some point in our game is obviously 1.0

.

One more important thing: all paths leading to {a,b}

will be taken care of before processing this set (this will be the next step).

+1


source


A colleague of mine has created a similar script in R with no functions, and his script takes 1 / 100th the time it takes mine.

Two simple optimizations:



1) In line, the function calls, for example, Is_Allowed()

because Python has a lot of function calls overhead (like creating a new stack frame and argument tuples).

2) Run the code with pypy , which is really good at optimizing features like this one.

+2


source


Edit: I misunderstood the original problem, here the solution presented addresses the following problem:

Given 4 groups with 3 different cards with a different score for each card, we draw cards until we have selected 3 cards from the same group. What is the expected result (the sum of the points of the selected cards) at the end of the game.

I leave the solution as it is because it was so nice to work with after so many years of theory and less, but I just can't delete it :)

See my other answer for handling the original problem.


There are two possibilities to improve performance: speeding up your code (and before starting that, you need to profile to find out which part of the program needs to be optimized, otherwise time is wasted optimizing things that are not taken into account) or improving the algorithm. I suggest doing the second.

Ok, this problem seems to be more complex than the first site. Let's start with some observations.


All you need to know is the expected number of selected cards at the end of the game:

If Pi

is the probability that a card is i

chosen somewhere during the game, then we are looking for the expected value of the score E(Score)=P1*W1+P2*W2+...Pn*Wn

. However, if we look at the cards in a group, we can state that due to symmetry, the probabilities for the cards in that group are the same, for example. P1=P2=P3=:Pgrand

in your case. Thus, our expectation can be calculated:

E(Score)=3*Pgrand*(W1+W2+W3)/3+...+3*Pbonus*(W10+W11+W12)/3

      

Let's name averageWgrand:=(W1+W2+W3)/3

and keep in mind that E(#grand)=3*Pgrand

- the expected number of selected Grand Cards at the end of the game. In this case, our formula becomes:

E(Score)=E(#grand)*averageWgrand+...+E(#bonus)*averageWbonus

      

In your example, we can go even further: the number of cards in each group is, therefore, due to the symmetry we may require: E(#grand)=E(#minor)=E(#major)=E(#grand)=:(E#group)

. For simplicity, in what follows, we will consider only this particular case (but the presented solution can be extended to the general case). This leads to the following simplification:

E(Score)=4*E(#group)(averageWgrand+...+averageWbonus)/4

      

Let's name averageW:=(averageWgrand+...+averageWbonus)/4

and note that E(#cards)=4*E(#grand)

- the expected number of collected cards at the end of the game.

Thus E(Score)=E(#cards)*averageW

, therefore, our task is reduced to calculating the expected value of the number of cards at the end of the game:

 E(#cards)=P(1)*1+P(2)*2+...P(n)*n

      

where P(i)

denotes the probability that the game ends with accurate cards i

. Probabilities P(1)

, P(2)

and P(k)

, k>9

it's easy to see - they are 0

.


Calculating the probability of the game ending using the i

selected cards - P(i)

:

Let's play a slightly different game: we choose exactly the i

cards and win if and only if:

  • There is exactly one group with three chosen cards. We call this group full_group

    .
  • The last selected (i-th) card was from full_group

    .

It is easy to see that the probability of winning this game P(win)

is exactly the probability we are looking for P(i)

. Once again we can use symmetry because all groups are equal ( P(win, full=grand)

meaning the probability that we are what and what full_group=grand

):

 P(win)=P(win, grand)+P(win, minor)+P(win, major)+P(win, bonus)
       =4*P(win, grand)

      

P(win, grand)

- the probability that:

  • after selecting the cards, the i-1

    number of selected grand cards is 2, that is, `# grand = 2 'and
  • after choosing cards i-1

    , for each group the number of selected cards is less than 3 and
  • we choose the grand card in the last round. Given the first two constraints, this (conditional) probability is 1/(n-i+1)

    (there are cards n-i+1

    on the left, and only one of them is "right").

From the urn problem we know the probability for

P(#grand=u, #minor=x, #major=y, #bonus=z) = binom(3,u)*binom(3,x)*binom(3,y)*binom(3,z)/binom(12, u+x+y+z)

      

with binom(n,k)=n!/k!/(n-k)!

. Thus, P(win, grand)

it can be calculated as:

P(win, grand) = 1/(n-i+1)*sum P(#grand=2, #minor=x, #major=y, #bonus=z)
               where x<=2, y<=2, z<=2 and 2+x+y+z=i-1

      


And now the code:

import math

def binom(n,k):
  return math.factorial(n)//math.factorial(k)//math.factorial(n-k)

#expected number of cards:

n=12 #there are 12 cards
probs=[0]*n
for minor in xrange(3):
  for major in xrange(3):
     for bonus in xrange(3):
         i = 3 + minor +major +bonus 
         P_urn = binom(3,2)*binom(3,minor)*binom(3,major)*binom(3,bonus)/float(binom(n, n-i+1))
         P_right_last_card = 1.0/(n-i+1)
         probs[i]+=4*P_urn*P_right_last_card #factor 4 from symmetry


print "Expected number of cards:", sum((prob*card_cnt for card_cnt, prob in enumerate(probs)))

      

As a result, I get 6.94285714286

as the expected number of cards at the end of the game. And very quickly - almost instantly. Not sure if this is correct ...


Output:

Obviously, if you like to handle the more general case (more groups, number maps in another group), you should extend the code (recursion, memoization binom

) and theory.

But the most important part: With this approach, you (almost) don't care what order the cards were chosen in, and therefore the number of conditions you have to check is reduced at (k-1)!

, where the k

maximum number of cards is at the end of the game. In your example k=9

and therefore the approach is faster with a factor 40000

(I am not even considering acceleration from exploited symmetry, because that might not be possible in general).

0


source







All Articles