How do I create combinations of multiple lists without hardcoding outlines?
I have data that looks like this:
my @homopol = (
["T","C","CC","G"], # part1
["T","TT","C","G","A"], #part2
["C","CCC","G"], #part3 ...upto part K=~50
);
my @prob = ([1.00,0.63,0.002,1.00,0.83],
[0.72,0.03,1.00, 0.85,1.00],
[1.00,0.97,0.02]);
# Note also that the dimension of @homopol is always exactly the same with @prob.
# Although number of elements can differ from 'part' to 'part'.
What I want to do is
- Generate all combinations of elements in
part1
via outpartK
- Find the product of the matching elements in
@prob
.
Hence, in the end, we hope to get this result:
T-T-C 1 x 0.72 x 1 = 0.720 T-T-CCC 1 x 0.72 x 0.97 = 0.698 T-T-G 1 x 0.72 x 0.02 = 0.014 ... G-G-G 1 x 0.85 x 0.02 = 0.017 G-A-C 1 x 1 x 1 = 1.000 G-A-CCC 1 x 1 x 0.97 = 0.970 G-A-G 1 x 1 x 0.02 = 0.020
The problem is that the following code of mine does it by hardcoding the loop. Since the number of parts @homopol
can be varied and large (for example, ~ K = 50), we need a flexible and compact way to get the same result. Whether there is a? I was thinking to use Algorithm :: Loops , but don't know how to achieve this.
use strict;
use Data::Dumper;
use Carp;
my @homopol = (["T","C","CC","G"],
["T","TT","C","G","A"],
["C","CCC","G"]);
my @prob = ([1.00,0.63,0.002,1.00,0.83],
[0.72,0.03,1.00, 0.85,1.00],
[1.00,0.97,0.02]);
my $i_of_part1 = -1;
foreach my $base_part1 ( @{ $homopol[0] } ) {
$i_of_part1++;
my $probpart1 = $prob[0]->[$i_of_part1];
my $i_of_part2 =-1;
foreach my $base_part2 ( @{ $homopol[1] } ) {
$i_of_part2++;
my $probpart2 = $prob[1]->[$i_of_part2];
my $i_of_part3 = -1;
foreach my $base_part3 ( @{ $homopol[2] } ) {
$i_of_part3++;
my $probpart3 = $prob[2]->[$i_of_part3];
my $nstr = $base_part1."".$base_part2."".$base_part3;
my $prob_prod = sprintf("%.3f",$probpart1 * $probpart2 *$probpart3);
print "$base_part1-$base_part2-$base_part3 \t";
print "$probpart1 x $probpart2 x $probpart3 = $prob_prod\n";
}
}
}
source to share
I would recommend Set::CrossProduct
which will create an iterator to get the cross product of all your sets. Since it uses an iterator, it doesn't need to generate every combination beforehand; rather, he gives to everyone on demand.
use strict;
use warnings;
use Set::CrossProduct;
my @homopol = (
[qw(T C CC G)],
[qw(T TT C G A)],
[qw(C CCC G)],
);
my @prob = (
[1.00,0.63,0.002,1.00],
[0.72,0.03,1.00, 0.85,1.00],
[1.00,0.97,0.02],
);
# Prepare by storing the data in a list of lists of pairs.
my @combined;
for my $i (0 .. $#homopol){
push @combined, [];
push @{$combined[-1]}, [$homopol[$i][$_], $prob[$i][$_]]
for 0 .. @{$homopol[$i]} - 1;
};
my $iterator = Set::CrossProduct->new([ @combined ]);
while( my $tuple = $iterator->get ){
my @h = map { $_->[0] } @$tuple;
my @p = map { $_->[1] } @$tuple;
my $product = 1;
$product *= $_ for @p;
print join('-', @h), ' ', join(' x ', @p), ' = ', $product, "\n";
}
source to share
A solution using Algorithm :: Loops without changing the input would look something like this:
use Algorithm::Loops;
# Turns ([a, b, c], [d, e], ...) into ([0, 1, 2], [0, 1], ...)
my @lists_of_indices = map { [ 0 .. @$_ ] } @homopol;
NestedLoops( [ @lists_of_indices ], sub {
my @indices = @_;
my $prob_prod = 1; # Multiplicative identity
my @base_string;
my @prob_string;
for my $n (0 .. $#indices) {
push @base_string, $hompol[$n][ $indices[$n] ];
push @prob_string, sprintf("%.3f", $prob[$n][ $indices[$n] ]);
$prob_prod *= $prob[$n][ $indices[$n] ];
}
print join "-", @base_string; print "\t";
print join "x", @prob_string; print " = ";
printf "%.3f\n", $prob_prod;
});
But I think that you could make the code clearer by changing the structure to another one like
[
{ T => 1.00, C => 0.63, CC => 0.002, G => 0.83 },
{ T => 0.72, TT => 0.03, ... },
...
]
because without parallel data structures, you can just iterate over the available underlying sequences, rather than iterate over the indices, and then look for those indices in two different places.
source to share
you can do this by creating an array of pointers the same length as @homopol (N say) array to keep track of which combination you are looking at. In fact, this array is similar to a number in base N, with the elements being digits. Iterate the same way as you write consectutive numbers in base N, for example (0 0 0 ... 0), (0 0 0 ... 1), ..., (0 0 0 ... N- 1), (0 0 0 ... 1 0), ....
source to share
Approach 1: Calculation by indices
Calculate the product of the lengths in the homofield (length1 * length2 * ... * lengthN). Then I iterate from zero to product. Now you need the indices i% length1, (i / length1)% length2, (i / length1 / length2)% length3, ...
Approach 2: Recursion
I got beat up, see nikie's answer. :-)
source to share