Calculating deciles / quintiles from an array using PHP / mySQL
I am getting data using a query like this:
SELECT `product_name`, `Number of Cases` FROM `myTable` where `color` = 'red' order by `Number of Cases` DESC";
I am returning an array like this if I var_dump
:
array(2) {
[0]=> array(4) { ["product_name"]=> string(5) "23418" [0]=> string(5) "23418" ["Number of Cases"]=> string(3) "434" [1]=> string(3) "434" }
[1]=> array(4) { ["product_name"]=> string(5) "79746" [0]=> string(5) "79746" ["Number of Cases"]=> string(3) "372" [1]=> string(3) "372" }
}
Above example, but obviously I have a lot more rows in a real table.
Here's what I want to do ...
Get all values ββfor "Number of cases" and divide them into quintiles (5 buckets with equal number of values ββin each "bucket") or deciles (10 buckets with the same number of values ββin each "bucket"). I need to return the maximum value (for " number of cases ") for each quintile or decile.
I see that the stream looks like this:
- order values ββfrom "number of cases"
- number of return values ββ(rows)
- are divided into equal "buckets" (for example, if there are 100 lines and I need deciles {10 equal buckets), then each bucket will have 10 lines.
- Get / return the highest value in each bucket
I know the steps, but I am unable to translate them into working PHP.
Any ideas will help.
Thank.
source to share
You can add a column tile
to data in MySQL using variables. Here's an example for both quintiles and deciles:
SELECT t.*,
floor((seqnum - 1) * 5 / @rn) as tile_5,
floor((seqnum - 1) * 10 / @rn) as tile_10
FROM (SELECT `product_name`, `Number of Cases`, (@rn := @rn + 1) as seqnum
FROM `myTable` CROSS JOIN
(SELECT @rn := 0) params
WHERE `color` = 'red'
ORDER BY `Number of Cases` DESC
) t
EDIT:
The above should work, but maybe something funky is happening to @rn
. If yes:
SELECT t.*,
floor((seqnum - 1) * 5 / x.cnt) as tile_5,
floor((seqnum - 1) * 10 / x.cnt) as tile_10
FROM (SELECT `product_name`, `Number of Cases`, (@rn := @rn + 1) as seqnum
FROM `myTable` CROSS JOIN
(SELECT @rn := 0) params
WHERE `color` = 'red'
ORDER BY `Number of Cases` DESC
) t CROSS JOIN
(SELECT COUNT(*) as cnt FROM myTable WHERE color = 'red') x;
source to share
I ended up with something like this for my query, which is pretty fast even on a decent sized table:
SELECT
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(`number of cases` ORDER BY `number of cases`SEPARATOR ','),
',', 20/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `20th`,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(`number of cases` ORDER BY `number of cases`SEPARATOR ','),
',', 40/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `40th`,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(`number of cases` ORDER BY `number of cases`SEPARATOR ','),
',', 60/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `60th`,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(
GROUP_CONCAT(`number of cases` ORDER BY `number of cases` SEPARATOR ','),
',', 80/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) AS `80th`
FROM `myTable` where `color`="red"
This example returns 4 numbers at the 20%, 40%, 60%, 80% percentile. The four numbers are the string value ( number of cases
) for a particular percentile.
Note that you will probably have to increase yours group_concat_max_len
for this to work ... something like this:
$sessiont = 'SET SESSION group_concat_max_len = 10485760;';
$fixed = $dbh->query($sessiont);
Chat tip on this blog for a solution: http://web.performancerasta.com/metrics-tips-calculating-95th-99th-or-any-percentile-with-single-mysql-query/
source to share