Calculating deciles / quintiles from an array using PHP / mySQL

I am getting data using a query like this:

SELECT `product_name`, `Number of Cases` FROM `myTable` where `color` = 'red' order by `Number of Cases` DESC";

      

I am returning an array like this if I var_dump

:

array(2) { 
[0]=> array(4) { ["product_name"]=> string(5) "23418" [0]=> string(5) "23418" ["Number of Cases"]=> string(3) "434" [1]=> string(3) "434" } 
[1]=> array(4) { ["product_name"]=> string(5) "79746" [0]=> string(5) "79746" ["Number of Cases"]=> string(3) "372" [1]=> string(3) "372" }
}

      

Above example, but obviously I have a lot more rows in a real table.

Here's what I want to do ...

Get all values ​​for "Number of cases" and divide them into quintiles (5 buckets with equal number of values ​​in each "bucket") or deciles (10 buckets with the same number of values ​​in each "bucket"). I need to return the maximum value (for " number of cases ") for each quintile or decile.

I see that the stream looks like this:

  • order values ​​from "number of cases"
  • number of return values ​​(rows)
  • are divided into equal "buckets" (for example, if there are 100 lines and I need deciles {10 equal buckets), then each bucket will have 10 lines.
  • Get / return the highest value in each bucket

I know the steps, but I am unable to translate them into working PHP.

Any ideas will help.

Thank.

+3


source to share


2 answers


You can add a column tile

to data in MySQL using variables. Here's an example for both quintiles and deciles:

SELECT t.*,
       floor((seqnum - 1) * 5 / @rn) as tile_5,
       floor((seqnum - 1) * 10 / @rn) as tile_10
FROM (SELECT `product_name`, `Number of Cases`, (@rn := @rn + 1) as seqnum
      FROM `myTable` CROSS JOIN
           (SELECT @rn := 0) params
      WHERE `color` = 'red'
      ORDER BY `Number of Cases` DESC
     ) t

      

EDIT:



The above should work, but maybe something funky is happening to @rn

. If yes:

SELECT t.*,
       floor((seqnum - 1) * 5 / x.cnt) as tile_5,
       floor((seqnum - 1) * 10 / x.cnt) as tile_10
FROM (SELECT `product_name`, `Number of Cases`, (@rn := @rn + 1) as seqnum
      FROM `myTable` CROSS JOIN
           (SELECT @rn := 0) params
      WHERE `color` = 'red'
      ORDER BY `Number of Cases` DESC
     ) t CROSS JOIN
     (SELECT COUNT(*) as cnt FROM myTable WHERE color = 'red') x;

      

+1


source


I ended up with something like this for my query, which is pretty fast even on a decent sized table:

SELECT
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(`number of cases` ORDER BY `number of cases`SEPARATOR ','),
',', 20/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `20th`,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(`number of cases` ORDER BY `number of cases`SEPARATOR ','),
',', 40/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `40th`,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(`number of cases` ORDER BY `number of cases`SEPARATOR ','),
',', 60/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `60th`,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(`number of cases` ORDER BY `number of cases` SEPARATOR ','),
',', 80/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `80th`
 FROM `myTable` where `color`="red"

      

This example returns 4 numbers at the 20%, 40%, 60%, 80% percentile. The four numbers are the string value ( number of cases

) for a particular percentile.



Note that you will probably have to increase yours group_concat_max_len

for this to work ... something like this:

$sessiont = 'SET SESSION group_concat_max_len = 10485760;';
$fixed = $dbh->query($sessiont); 

      

Chat tip on this blog for a solution: http://web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/

0


source







All Articles