Python - get index of combination
I need to generate a combination from "a" to "]]]]]]", for example I use this python script for that which works well.
import itertools
DATA_ALPHA_NUM =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789&-
()@=+;/!%$\\'\",.<>*^{}#~_[]"
b = 10
for i in range(1,int(b)+1):
for e in itertools.combinations(DATA_ALPHA_NUM,i): print(''.join(e))
But now I need to do the opposite, for example: if I give "1" to a new script, it will output "a", if I give 90, it will output "]", etc.
I wrote some scripts that worked great for less than 737191 combinations, but they weren't good after that.
EDIT : someone will write something like this and then remove it. It was almost perfect.
alphaNumList = list(DATA_ALPHA_NUM)
alphaNumList.insert(0, "")
result = ["".join(item) for item in
itertools.islice(itertools.product(alphaNumList, repeat=6), 0,1)]
source to share
Overview
The key to this is going through the cumulative combinations before reaching the index.
Decision
from math import factorial
def comb(n, r):
'Combinations of n things taken r at a time'
return factorial(n) // (factorial(r) * factorial(n - r))
def nth_combination(population, r, index):
'Equivalent to list(combinations(population, r))[index]'
n = len(population)
c = comb(n, r)
if index < 0:
index += c
if index < 0 or index >= c:
raise IndexError
if r < 0 or r > n:
raise ValueError
result = []
while r:
c, n, r = c*r//n, n-1, r-1
while index >= c:
index -= c
c, n = c*(n-r)//n, n-1
result.append(population[-1-n])
return tuple(result)
Optimization
If speed is a concern, it is possible to build a faster version of the comb () function.
One way is to precompress the factorials and then search for them as needed:
fact = [1]
for i in range(1, 1000):
fact.append(fact[-1] * i)
def comb(n, r):
return fact[n] // (fact[r] * fact[n - r])
And there is another way that avoids large factorials altogether and does not require additional storage:
def comb(n, r):
c = 1
r = min(r, n-r)
for i in range(1, r+1):
c = c * (n - r + i) // i
return c
How it works
Start by breaking down the combinations into component groups:
def groups(n, r):
return [comb(n-i-1, r-1) for i in range(n-r+1)]
>>> comb(8, 3)
56
>>> groups(8, 3)
[21, 15, 10, 6, 3, 1]
This means that when executed , there are 56 combinations itertools.combinations('ABCDEFGH', 3)
for n=8
letters received r=3
at one time. The first 21 starts with A
, and the next 15 starts with B
, the next 10 starts with C
, the next 6 starts with D
, the next 3 starts with E
, and the last 1 starts with F
.
Say that you want to find the 25th combination out of 56. This belongs to the second group, so your first letter B
.
And since 25 - 21 - 4, you will need to find the fourth combination in 15 members of the "B" group being defined itertools.combinations('CDEFGH', 2)
. Just repeat the above process until all letters have been extracted.
Testing
In this test, to make sure it gives the expected results:
from itertools import combinations
population = 'ABCDEFGH'
for r in range(len(population) + 1):
for i, expected in enumerate(combinations(population, r)):
actual = locate(list(population), r, i)
assert expected == actual
source to share
You don't want COMBINATIONS. Indeed, you want "aa". But with combinations, since you never choose TWICE the same element, that won't happen.
So here is the correct version with a "cumulative product", indeed, since Raymond does with combinations, I have to calculate (90, 90 + 90 ** 2, 90 + 90 ** 2 + 90 ** 3, ...) to find what good power matches the combination I'm tracking.
Note that it is not optimized as I am slicing the product ... just to get one value!
import itertools
alphaNumList = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789&-()@=+;/!%$\\'\",.<>*^{}#~_[]")
cumulative = [len(alphaNumList)]
for i in range(1, 10):
cumulative.append(len(alphaNumList) ** (i+1) + cumulative[i - 1])
def getCombiFromIndex(combiNumber):
p = 0
while cumulative[p] < combiNumber:
p += 1 # WARNING : not robust to combi greater than (10,90) in my case :)
rest = combiNumber - 1 - (cumulative[p - 1] if p > 0 else 0)
return "".join([item for item in itertools.islice(itertools.product(alphaNumList, repeat=p + 1), rest, rest + 1)][0])
print(getCombiFromIndex(1)) # "a"
print(getCombiFromIndex(90)) # "]"
print(getCombiFromIndex(91)) # "aa"
print(getCombiFromIndex(800064)) # "ah+1"
UPDATE: I added a method to fetch a list between two indices based on the same concept, but in this case it is better to use a slice :)
def getCombiListBetween(fromNumber, toNumber):
combiList = []
if fromNumber < toNumber:
pF, pT = 0, 0
while cumulative[pF] < fromNumber:
pF += 1
while cumulative[pT] < toNumber:
pT += 1
start = fromNumber - 1 - (cumulative[pF - 1] if pF > 0 else 0)
end = toNumber - 1
for p in range(pF, pT + 1):
combiList += ["".join(item) for item in itertools.islice(itertools.product(alphaNumList, repeat=p + 1), start, min(cumulative[p], end) + 1)]
start = 0
end -= cumulative[p]
else:
combiList.append(getCombiFromIndex(fromNumber))
return combiList
print(getCombiListBetween(8189, 8191)) # ['][', ']]', 'aaa']
source to share