Get the top and bottom 25th percentile
I have a table listing employees and the number of units they sold.
I want to get 25 percent Avg units sold, and 25 percent Avg units sold and sold.
I created my data view SLQ Fiddle
I really don't know how to start with this? All examples I see are for SQL Server, not MySQL. That's what I think.
I want 25 percent and can't limit 25 points. This will mainly include:
1) #_of_employees = The number of total employees.
2) #_of_employees_in_25_percentile = #_of_employees*0.25
3) Calculate the sum of the units sold by the top/bottom 25 percentile (limit #_of_employees_in_25_percentile)
4) Divide the sum by #_of_employees_in_25_percentile to get the average.
How can all of this be done efficiently in MySQL?
source to share
This is a solution that uses a tricky trick I learned from this question .
SELECT id, unit_sold, n * 100 / @total AS percentile
FROM (
SELECT id, unit_sold, @total := @total + unit_sold AS n
FROM mydata, (SELECT @total := 0) AS total
ORDER BY unit_sold ASC
) AS t
source to share
How about going with this logic:
- Select all, order by percentile (DESC), limit to 25
- Select all, order by percentile (ASC), limit to 25
Is this the type of logic you are looking for?
Examples of requests:
$q1 = mysql_query(SELECT * FROM table_name ORDER BY percentile DESC LIMIT 25)
$q2 = mysql_query(SELECT * FROM table_name ORDER BY percentile ASC LIMIT 25)
source to share
How about this?
SELECT
SUM(unit_sold) AS sum_tot, SUM(unit_sold)/count(id) AS average,
SUM(CASE WHEN percentile<25 THEN unit_sold ELSE 0 END) AS sum_top25,
SUM(CASE WHEN percentile<25 THEN 1 ELSE 0 END) AS count_top25,
SUM(CASE WHEN percentile<25 THEN unit_sold ELSE 0 END)/SUM(CASE WHEN percentile<25 THEN 1 ELSE 0 END) AS average_top25,
SUM(CASE WHEN percentile>75 THEN unit_sold ELSE 0 END) AS sum_bottom25,
SUM(CASE WHEN percentile>75 THEN 1 ELSE 0 END) AS count_bottom25,
SUM(CASE WHEN percentile>75 THEN unit_sold ELSE 0 END)/SUM(CASE WHEN percentile>75 THEN 1 ELSE 0 END) AS average_bottom25
FROM
(SELECT
id, unit_sold, c * 100 / @counter AS percentile
FROM
(SELECT
m.*, @counter:=@counter+1 AS c
FROM
(SELECT @counter:=0) AS initvar, mydata AS m
ORDER BY unit_sold desc
) AS t
WHERE
c <= (25/100 * @counter)
OR c >= (75/100 * @counter)
) AS t2
Output:
SUM_TOT AVERAGE SUM_TOP25 COUNT_TOP25 AVERAGE_TOP25 SUM_BOTTOM25 COUNT_BOTTOM25 AVERAGE_BOTTOM25
850 283.3333 500 1 500 350 2 175
See SQL Fiddle .
The idea is to use MySQL: LIMIT on a percentage of the number of records? to get the percentiles. Based on this (and pdw's answer ), we create an output where we only show 25% and bottom 75%.
Finally, we count and sum to get the requested values.
Note that this runs on top of the command:
SELECT
id, unit_sold, c * 100 / @counter AS percentile
FROM
(SELECT
m.*, @counter:=@counter+1 AS c
FROM
(SELECT @counter:=0) AS initvar, mydata AS m
ORDER BY unit_sold desc
) AS t
WHERE
c <= (25/100 * @counter)
OR c >= (75/100 * @counter)
Whose output:
ID UNIT_SOLD PERCENTILE
d 500 20
a 250 80
e 100 100
source to share