# Calculating the closest matches from combinations of array values

I have an array of lengths of parts, for examples: -

```
array(150, 180, 270);
```

Then I have a dimension `($a = 440)`

I need to calculate the two **closest** possible combinations of lengths that are greater than `$a`

that, without having to manually create hundreds of possible combinations for it to work.

So:

`150`

`180`

`270`

`150 + 150`

`150 + 180`

`150 + 270`

`180 + 180`

`180 + 270`

`270 + 270`

`150 + 150 + 150`

`150 + 150 + 180`

.. etc.

This needs to be done for a certain number of times, and not just find the first two matches and stop, as it `150 + 150 + 150`

will match **closer** to `$a`

than `270 + 270`

but can work after.

edit: I also need to store the combination of the details that made up the match, preferably in an array.

I hope I have explained this well enough for someone to understand.

source to share

Since this is a rather heavy script resource, I thought it would be nice to give the option to pre-shape the selection and then use that data to create a / object / sql script variable to persist the data. For example, doing something like

```
SELECT * FROM combination_total WHERE size > YOUR_SIZE ORDER BY size ASC LIMIT 2;
```

The new script I have is similar, but it just generates an array of all combinations without any duplicates. Seems pretty fast again. Note the $ maxLength variable, which is currently set to 2000, which can be changed with your largest size possible.

```
<?php
$partLengths = array(150, 180, 270);
$currentCombinations = array(
array(
'total' => 150,
'combination' => array(150)
),
array(
'total' => 180,
'combination' => array(180)
),
array(
'total' => 270,
'combination' => array(270)
)
);
$maxLength = 2000;
$largestSize = 0;
function generateCombination() {
global $currentCombinations, $largestSize, $partLengths;
$tmpCombinations = $currentCombinations;
foreach ($tmpCombinations as $combination) {
foreach ($partLengths as $partLength) {
$newCombination = $combination['combination'];
$newCombination[] = $partLength;
sort($newCombination);
$newCombinationTotal = array_sum($newCombination);
if (!combinationExists($newCombination)) {
$currentCombinations[] = array(
'total' => $newCombinationTotal,
'combination' => $newCombination
);
}
$largestSize = ($newCombinationTotal > $largestSize) ? $newCombinationTotal : $largestSize;
}
}
}
function combinationExists($combination) {
global $currentCombinations;
foreach ($currentCombinations as $currentCombination) {
if ($combination == $currentCombination['combination']) {
return true;
}
}
return false;
}
while ($largestSize < $maxLength) {
generateCombination();
}
// here you can use $currentCombinations to generate sql/object/etc
var_dump($currentCombinations);
?>
```

source to share

The following code is brute force and only checks possible combinations of two values, so I know it is incomplete. However, this is the beginning.

** UPDATE: See my other answer below for a much better solution that works with every possible combination, not just 2, and this is optimized.**

```
<?php
echo "<html><head><title>Test Array Sums</title></head><body>";
$testarray = array(2, 5, 9, 78, 332);
$target_value = 10;
$closest1 = 0;
$closest2 = 0;
$closest_sum = 0;
$closest_difference = 0;
$first_time_in_loop = TRUE;
foreach ($testarray AS $entry1)
{
foreach ($testarray AS $entry2)
{
if ($first_time_in_loop)
{
$first_time_in_loop = FALSE;
$closest1 = $entry1;
$closest2 = $entry2;
$closest_sum = $closest1 + $closest2;
$closest_difference = abs($target_value - $closest_sum);
}
$test_sum = $entry1 + $entry2;
if (abs($test_sum - $target_value) < $closest_difference)
{
if ($test_sum - $target_value >= 0)
{
// Definitely the best so far
$closest1 = $entry1;
$closest2 = $entry2;
$closest_sum = $closest1 + $closest2;
$closest_difference = abs($closest_sum - $target_value);
}
else if ($closest_sum - $target_value < 0)
{
// The sum isn't big enough, but neither was the previous best option
// and at least this is closer
$closest1 = $entry1;
$closest2 = $entry2;
$closest_sum = $closest1 + $closest2;
$closest_difference = abs($closest_sum - $target_value);
}
}
else
{
if ($closest_sum - $target_value < 0 && $test_sum - $target_value >= 0)
{
// $test_value is farther away from the target than the previous best option,
// but at least it bigger than the target value (the previous best option wasn't)
$closest1 = $entry1;
$closest2 = $entry2;
$closest_sum = $closest1 + $closest2;
$closest_difference = abs($closest_sum - $target_value);
}
}
}
}
echo "Best pair: " . $closest1 . ", " . $closest2 . "<br />";
echo "</body></html>";
?>
```

Can you limit the total number of test values to 3 - or some larger number - or do you really need to expand it to all possible combinations (i.e. if 4 + 4 + 5 + 4 + 4 + 5 + 3 + 5 + 4 + 5 + 3 + 4 is closer than 26 + 26 than you need to find?)

If you can limit the number of test subjects to, say, 5, then you can simply extend the loop above to handle up to 5 options. Otherwise, you need to write a more complex loop.

source to share

This code works with the closest combination above $ a, and the next closest one after it. It removes duplicates to speed things up a bit. It's not mega-optimized, but initial tests show that it's not that bad, depending on the initial value of $ a not being massive.

```
<?php
/* value in cm */
$a = 1020;
$partLengths = array(150, 180, 270);
$closestValue = array();
$secondClosest = array();
$currentCombinations = array(
array(
'total' => 150,
'combination' => array(150)
),
array(
'total' => 180,
'combination' => array(180)
),
array(
'total' => 270,
'combination' => array(270)
)
);
function getCombinations(&$currentCombinations, $partLengths,$a, &$closestValue, &$secondClosest) {
$tmpCombinations = $currentCombinations;
static $secondMatch = true;
for ($x=0;$x<count($partLengths);$x++) {
for ($y=0;$y<count($tmpCombinations);$y++) {
$newCombination = $tmpCombinations[$y]['combination'];
$newCombination[] = $partLengths[$x];
$newCombinationTotal = array_sum($newCombination);
sort($newCombination);
if (!combinationExists($currentCombinations, $newCombination, $newCombinationTotal)) {
$currentCombinations[] = array('total' => $newCombinationTotal, 'combination' => $newCombination);
}
if ($closestValue['total'] < $a) {
$oldGap = $a - $closestValue['total'];
$newGap = $a - $newCombinationTotal;
$newGap = ($newGap < 0) ? 0 - $newGap : $newGap;
if ($newGap < $oldGap) {
$secondClosest = $closestValue;
$closestValue['total'] = $newCombinationTotal;
$closestValue['combination'] = $newCombination;
}
} else {
$oldGap = $a - $secondClosest['total'];
$newGap = $a - $newCombinationTotal;
$oldGap = ($oldGap < 0) ? 0 - $oldGap : $oldGap;
$newGap = ($newGap < 0) ? 0 - $newGap : $newGap;
if ($newCombinationTotal > $a && $newCombinationTotal > $closestValue['total']) {
if ($secondMatch || $newGap < $oldGap) {
$secondMatch = false;
$secondClosest['total'] = $newCombinationTotal;
$secondClosest['combination'] = $newCombination;
}
}
}
}
}
}
function combinationExists(&$currentCombinations, $newCombination, $newCombinationTotal) {
foreach ($currentCombinations as $currentCombination) {
if ($currentCombination['total'] != $newCombinationTotal && $currentCombination['combination'] != $newCombination) {
return false;
}
}
return false;
}
while ($secondClosest['total'] <= $a) {
getCombinations($currentCombinations, $partLengths, $a, $closestValue, $secondClosest);
}
var_dump($closestValue);
var_dump($secondClosest);
?>
```

The next suggestion if speed becomes an issue is to pre-generate all combinations and store them as a hash / database / etc that you can easily access.

source to share

Improving on my previous answer, here is a version that works to check for any number of records, up to the maximum number.

**UPDATE** : (added optimization, see comments below)

For example, if the desired value `15`

and the list `(1, 17, 20)`

, the best choice is `1+1+1+1+1+1+1+1+1+1+1+1+1+1+1`

, so you will need to allow `$max_loops`

below at least `15`

to find this match - even though there are only 3 values in the list! This is worse for `(1, 133, 138)`

where the desired value is, say `130`

. In this case, you need **130** recursions! You can see that this could be an optimization nightmare. But the algorithm below works and is quite well optimized.

```
<?php
echo "<html><head><title>Test Array Sums</title></head><body>";
$testarray = array(1, 3, 6);
$target_value = 10;
$current_closest_sum = 0;
$current_closest_difference = 0;
$first_time_in_loop = TRUE;
$max_loops = 10;
$current_loop = 0;
$best_set = array();
$current_set = array();
$sums_already_evaluated = array();
function nestedLoop($current_test = 0)
{
global $testarray, $target_value, $current_closest_sum, $current_closest_difference, $first_time_in_loop, $max_loops, $current_loop, $best_set, $current_set, $sums_already_evaluated;
++$current_loop;
foreach ($testarray AS $entry)
{
$current_set_temp = $current_set;
$current_set[] = $entry;
if ($first_time_in_loop)
{
$first_time_in_loop = FALSE;
$current_closest_sum = $entry + $current_test;
$current_closest_difference = abs($target_value - $current_closest_sum);
$best_set[] = $entry;
}
$test_sum = $entry + $current_test;
if (in_array($test_sum, $sums_already_evaluated))
{
// no need to test a sum that has already been tested
$current_set = $current_set_temp;
continue;
}
$sums_already_evaluated[] = $test_sum;
if ($test_sum > $target_value && $current_closest_sum > $target_value && $test_sum >= $current_closest_sum)
{
// No need to evaluate a sum that is certainly worse even by itself
$current_set = $current_set_temp;
continue;
}
$set_best = FALSE;
if (abs($test_sum - $target_value) < $current_closest_difference)
{
if ($test_sum - $target_value >= 0)
{
// Definitely the best so far
$set_best = TRUE;
}
else if ($current_closest_sum - $target_value < 0)
{
// The sum isn't big enough, but neither was the previous best option
// and at least this is closer
$set_best = TRUE;
}
}
else
{
if ($current_closest_sum - $target_value < 0 && $test_sum - $target_value >= 0)
{
// $test_value is farther away from the target than the previous best option,
// but at least it bigger than the target value (the previous best option wasn't)
$set_best = TRUE;
}
}
if ($set_best)
{
$current_closest_sum = $test_sum;
$current_closest_difference = abs($current_closest_sum - $target_value);
$best_set = $current_set;
}
if ($current_loop < $max_loops)
{
if ($test_sum - $target_value < 0)
{
nestedLoop($test_sum);
}
}
$current_set = $current_set_temp;
}
--$current_loop;
}
// make array unique
$testarray = array_unique($testarray);
rsort($testarray, SORT_NUMERIC);
// Enter the recursion
nestedLoop();
echo "Best set: ";
foreach ($best_set AS $best_set_entry)
{
echo $best_set_entry . " ";
}
echo "<br />";
echo "</body></html>";
?>
```

UPDATE: I've added two small optimizations that seem to help a lot, and avoid memory overload or hash table lookups. It:

(1) Track all previously assessed amounts and do not re-evaluate them.

(2) If the sum (by itself) is already worse than the previous test, skip further tests with that amount.

I think that with these two optimizations, the algorithm might work well enough for realistic use in your situation.

** PREVIOUS COMMENTS BELOW, NOW RELATED INSUFFICIENCY**

My previous comments below are somewhat controversial because these two optimizations seem to work very well. But I add comments anyway.

Unfortunately, as noted, the above loop is highly optimized. It will need to be optimized to work in a real-world situation, avoiding repeated tests (and other optimizations). However, it demonstrates an algorithm that works.

Note that this is a difficult area mathematically. Different optimizations can help one scenario, but not another. Therefore, for the algorithm to work above, you will need to discuss realistic use cases. Will there be a maximum length limit in the parts list? What is the length range? And other, more subtle details of the parts list and the desired goal, albeit subtle, can greatly affect how an algorithm is optimized.

This is the case where the "theoretical" problem is not enough to get the desired solution, since optimization is so critical. Therefore, it is especially useful to make suggestions for optimization.

Leonard's optimizations, for example (avoiding duplicates, keeping all previously tested combinations) works well for a small set, but memory usage will explode for large sets (as he noted). This is not an easy problem.

(code edited ~ 2 hours later to handle possible missing combination due to recursion limitation to a certain number of recursions - first sorting the array from high to low)

source to share