Python - get index of combination

I need to generate a combination from "a" to "]]]]]]", for example I use this python script for that which works well.

import itertools

DATA_ALPHA_NUM = 
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789&-
()@=+;/!%$\\'\",.<>*^{}#~_[]"
b = 10

for i in range(1,int(b)+1):
   for e in itertools.combinations(DATA_ALPHA_NUM,i): print(''.join(e))

      

But now I need to do the opposite, for example: if I give "1" to a new script, it will output "a", if I give 90, it will output "]", etc.

I wrote some scripts that worked great for less than 737191 combinations, but they weren't good after that.

EDIT : someone will write something like this and then remove it. It was almost perfect.

alphaNumList = list(DATA_ALPHA_NUM)
alphaNumList.insert(0, "")

result = ["".join(item) for item in 
itertools.islice(itertools.product(alphaNumList, repeat=6), 0,1)]

      

+3


source to share


2 answers


Overview

The key to this is going through the cumulative combinations before reaching the index.

Decision

from math import factorial

def comb(n, r):
    'Combinations of n things taken r at a time'
    return factorial(n) // (factorial(r) * factorial(n - r))

def nth_combination(population, r, index):
    'Equivalent to list(combinations(population, r))[index]'
    n = len(population)
    c = comb(n, r)
    if index < 0:
        index += c
    if index < 0 or index >= c:
        raise IndexError
    if r < 0 or r > n:
        raise ValueError
    result = []
    while r:
        c, n, r = c*r//n, n-1, r-1
        while index >= c:
            index -= c
            c, n = c*(n-r)//n, n-1
        result.append(population[-1-n])
    return tuple(result)

      

Optimization

If speed is a concern, it is possible to build a faster version of the comb () function.

One way is to precompress the factorials and then search for them as needed:

fact = [1]
for i in range(1, 1000):
    fact.append(fact[-1] * i)

def comb(n, r):
    return fact[n] // (fact[r] * fact[n - r])

      

And there is another way that avoids large factorials altogether and does not require additional storage:

def comb(n, r):
    c = 1
    r = min(r, n-r)
    for i in range(1, r+1):
        c = c * (n - r + i) // i
    return c

      



How it works

Start by breaking down the combinations into component groups:

def groups(n, r):
    return [comb(n-i-1, r-1) for i in range(n-r+1)]

>>> comb(8, 3)
56
>>> groups(8, 3)
[21, 15, 10, 6, 3, 1]

      

This means that when executed , there are 56 combinations itertools.combinations('ABCDEFGH', 3)

for n=8

letters received r=3

at one time. The first 21 starts with A

, and the next 15 starts with B

, the next 10 starts with C

, the next 6 starts with D

, the next 3 starts with E

, and the last 1 starts with F

.

Say that you want to find the 25th combination out of 56. This belongs to the second group, so your first letter B

.

And since 25 - 21 - 4, you will need to find the fourth combination in 15 members of the "B" group being defined itertools.combinations('CDEFGH', 2)

. Just repeat the above process until all letters have been extracted.

Testing

In this test, to make sure it gives the expected results:

from itertools import combinations

population = 'ABCDEFGH'
for r in range(len(population) + 1):
    for i, expected in enumerate(combinations(population, r)):
        actual = locate(list(population), r, i)
        assert expected == actual

      

+2


source


You don't want COMBINATIONS. Indeed, you want "aa". But with combinations, since you never choose TWICE the same element, that won't happen.

So here is the correct version with a "cumulative product", indeed, since Raymond does with combinations, I have to calculate (90, 90 + 90 ** 2, 90 + 90 ** 2 + 90 ** 3, ...) to find what good power matches the combination I'm tracking.

Note that it is not optimized as I am slicing the product ... just to get one value!



import itertools

alphaNumList = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789&-()@=+;/!%$\\'\",.<>*^{}#~_[]")

cumulative = [len(alphaNumList)]
for i in range(1, 10):
    cumulative.append(len(alphaNumList) ** (i+1) + cumulative[i - 1])


def getCombiFromIndex(combiNumber):
    p = 0
    while cumulative[p] < combiNumber:
        p += 1  # WARNING : not robust to combi greater than (10,90) in my case :)
    rest = combiNumber - 1 - (cumulative[p - 1] if p > 0 else 0)
    return "".join([item for item in itertools.islice(itertools.product(alphaNumList, repeat=p + 1), rest, rest + 1)][0])


print(getCombiFromIndex(1))  # "a"
print(getCombiFromIndex(90))  # "]"
print(getCombiFromIndex(91))  # "aa"
print(getCombiFromIndex(800064))  # "ah+1"

      

UPDATE: I added a method to fetch a list between two indices based on the same concept, but in this case it is better to use a slice :)

def getCombiListBetween(fromNumber, toNumber):
    combiList = []
    if fromNumber < toNumber:
        pF, pT = 0, 0
        while cumulative[pF] < fromNumber:
            pF += 1
        while cumulative[pT] < toNumber:
            pT += 1
        start = fromNumber - 1 - (cumulative[pF - 1] if pF > 0 else 0)
        end = toNumber - 1
        for p in range(pF, pT + 1):
            combiList += ["".join(item) for item in itertools.islice(itertools.product(alphaNumList, repeat=p + 1), start, min(cumulative[p], end) + 1)]
            start = 0
            end -= cumulative[p]
    else:
        combiList.append(getCombiFromIndex(fromNumber))
    return combiList

print(getCombiListBetween(8189, 8191)) # ['][', ']]', 'aaa']

      

+1


source







All Articles