Select for averages based on different date ranges in one MySQL query
Basically I am trying to plot a chart with this data. I can put my query in a while loop in PHP to get the average, but I would prefer it done with a single query creating one table of results.
<?php
date_default_timezone_set('America/Los_Angeles');
include('../connect.php');
$subcategory = 'T-Shirts';
$date = date('Y-m-d', strtotime('-29 days'));
$today = date("Y-m-d");
$subcategory = mysqli_real_escape_string($conp, $subcategory);
echo "<table border=\"1\">";
echo "<tr>";
echo "<th>date</th>";
echo "<th>average</th>";
echo "</tr>";
while (strtotime($date) <= strtotime($today)) {
$from_date = date ("Y-m-d", strtotime("-29 day", strtotime($date)));
$query = $conp->query("SELECT ROUND(SUM(OutCount)/30) AS 'average' FROM inventory
LEFT JOIN item
ON inventory.itemcode = item.itemcode
WHERE item.subcategory = '$subcategory'
AND TrDateTime BETWEEN '$from_date' AND '$date' AND transactiontype like 'OUT_%'");
if($query->num_rows){
while($row = mysqli_fetch_array($query, MYSQL_ASSOC)){
if(!empty($row['average'])){
$average = $row['average'];
}else{
$average = "N/A";
}
}
mysqli_free_result($query);
}else{
$average = "N/A";
}
$date = date ("Y-m-d", strtotime("+1 day", strtotime($date)));
echo "<tr>";
echo "<td>" . $date . "</td>";
echo "<td>" . $average . "</td>";
echo "</tr>";
}
echo "</table>";
?>
I get all dates for the last 30 days (including today) and average sales 29 days before that date.
+------------+----------+
| date | average |
+------------+----------+
| 2015-04-09 | 222 |
| 2015-04-10 | 225 |
| 2015-04-11 | 219 |
| ... | ... |
+------------+----------+
I can get everything I need, but 29 queries work in this situation and MySQL will be significantly faster. I started to come up with a MySQL procedure, but I'm not sure how well it will work when I try to call it using PHP.
DELIMITER //
CREATE PROCEDURE average_daily_sales()
BEGIN
SET @today = CURDATE();
SET @date_var = CURDATE() - INTERVAL 29 DAY;
SET @from_date = @date_var - INTERVAL 29 DAY;
SET @to_date = @from_date + INTERVAL 29 DAY;
label1: WHILE @date_var < @today DO
SELECT DATE_FORMAT(trdatetime, '%Y-%m-%d') as 'date', ROUND(SUM(OutCount)/30) AS 'average'
FROM inventory
LEFT JOIN item
ON inventory.itemcode = item.itemcode
WHERE item.subcategory = 'T-Shirts'
AND trdatetime BETWEEN @from_date - INTERVAL 29 DAY AND @to_date
AND transactiontype like 'OUT_%';
SET @date_var = @date_var + INTERVAL 1 DAY;
END WHILE label1;
END; //
DELIMITER ;
Ultimately, I would prefer a regular MySQL statement that I can use to create the results table I want in one shot. Any help would be greatly appreciated.
source to share
In the suggestions from @OllieJones and @ spencer7593 either required a "transaction" every day, in order to use SELECT DISTINCT DATE(trdatetime)
, you needed to create another table, or you needed to create a derived table.
SELECT DISTINCT DATE(trdatetime)
was not an option for me, because I did not have transactions for every day.
The hybrid PHP and MySQL example that @ spencer7593 suggested would generate the derived table very well. As a result, the static version took about 1.8 seconds to get the result. The problem is that you will need additional PHP to generate this ... (see @ Spencer7593's answer)
SELECT cal.dt
, ( -- correlated subquery references value returned from cal
SELECT ROUND(SUM(n.OutCount)/30)
FROM inventory n
JOIN item t
ON t.itemcode = n.itemcode
WHERE t.subcategory = 'foo'
AND n.TrDateTime >= cal.dt + INTERVAL -28 DAY
AND n.TrDateTime < cal.dt + INTERVAL 1 DAY
AND n.transactiontype LIKE 'OUT_%'
) AS `average`
FROM ( SELECT DATE('2015-04-01') AS dt
UNION ALL SELECT DATE('2015-04-02')
UNION ALL SELECT DATE('2015-04-03')
UNION ALL SELECT DATE('2015-04-04')
UNION ALL SELECT DATE('2015-04-05')
UNION ALL SELECT DATE('2015-04-06')
etc...
) cal
WHERE cal.dt >= '2015-04-01'
AND cal.dt < '2015-05-01'
ORDER BY cal.dt
I am trying to use another answer by @ spencer7593. I created a table "source of integers" with numbers 0-31 as he suggested. This method took just over 1.8 seconds.
SELECT cal.sd, cal.ed
, ( -- correlated subquery references value returned from cal
SELECT ROUND(SUM(n.OutCount)/30)
FROM inventory n
JOIN item t
ON t.itemcode = n.itemcode
WHERE t.subcategory = 'foobar'
AND n.TrDateTime >= cal.ed + INTERVAL -30 DAY
AND n.TrDateTime < cal.ed + INTERVAL 1 DAY
AND n.transactiontype LIKE 'OUT_%'
) AS `average`
FROM ( SELECT (CURDATE() + INTERVAL -30 DAY) + INTERVAL i.n DAY as `ed`, (((CURDATE() + INTERVAL -30 DAY) + INTERVAL i.n DAY) + INTERVAL - 30 DAY) as `sd`
FROM source_of_integers i
WHERE i.n >= 0
AND i.n < 31
ORDER BY i.n
) cal
WHERE cal.ed >= CURDATE() + INTERVAL -29 DAY
AND cal.ed <= CURDATE()
ORDER BY cal.ed;
For these dates you need the source of the strings, it doesn't actually happen. I ended up making a cal.
CREATE TABLE cal (
dt DATE NOT NULL PRIMARY KEY
);
CREATE TABLE ints ( i tinyint );
INSERT INTO ints VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
INSERT INTO cal (dt)
SELECT DATE('2010-01-01') + INTERVAL a.i*10000 + b.i*1000 + c.i*100 + d.i*10 + e.i DAY
FROM ints a JOIN ints b JOIN ints c JOIN ints d JOIN ints e
WHERE (a.i*10000 + b.i*1000 + c.i*100 + d.i*10 + e.i) <= 3651
ORDER BY 1;
And then followed a slightly modified version of @ spencer7593's answer on it.
SELECT cal.dt
, ( -- correlated subquery references value returned from cal
SELECT ROUND(SUM(n.OutCount)/30)
FROM inventory n
JOIN item t
ON t.itemcode = n.itemcode
WHERE t.subcategory = 'foo'
AND n.TrDateTime >= cal.dt + INTERVAL -28 DAY
AND n.TrDateTime < cal.dt + INTERVAL 1 DAY
AND n.transactiontype LIKE 'OUT_%'
) AS `average`
FROM cal
WHERE cal.dt >= CURDATE() + INTERVAL -30 DAY
AND cal.dt < CURDATE()
ORDER BY cal.dt;
In my opinion, I believe this is the cleanest (less PHP) and most efficient answer.
Here's how I indexed the inventory table to speed it up significantly:
ALTER TABLE inventory ADD KEY (ItemCode, TrDateTime, TransactionType);
Thanks @OllieJones and @ spencer7593 for your help!
source to share
If you create a calendar table and populate it with a range of date values ββlike
CREATE TABLE cal (dt DATE NOT NULL PRIMARY KEY) ;
INSERT INTO cal VALUES ('2015-04-01'),('2015-04-02'),('2015-04-03'), ... ;
you can use this as a string source in a query like:
SELECT cal.dt
, ( -- correlated subquery references value returned from cal
SELECT ROUND(SUM(n.OutCount)/30)
FROM inventory n
JOIN item t
ON t.itemcode = n.itemcode
WHERE t.subcategory = 'foo'
AND n.TrDateTime >= cal.dt + INTERVAL -28 DAY
AND n.TrDateTime < cal.dt + INTERVAL 1 DAY
AND n.transactiontype LIKE 'OUT_%'
) AS `average`
FROM cal
WHERE cal.dt >= '2015-04-01'
AND cal.dt < '2015-05-01'
ORDER BY cal.dt
It is not necessary to create a calendar table cal
. We could use an inline view and give it an alias cal
. For example, in the query above, we could replace this line:
FROM cal
with this:
FROM ( SELECT DATE('2015-04-01') AS dt
UNION ALL SELECT DATE('2015-04-02')
UNION ALL SELECT DATE('2015-04-03')
UNION ALL SELECT DATE('2015-04-04')
UNION ALL SELECT DATE('2015-04-05')
) cal
Or, if you have a string source that can give you a contiguous sequence of integers starting at zero, you can create your date values ββfrom the base date, for example
FROM ( SELECT '2014-04-01' + INTERVAL i.n DAY
FROM source_of_integers i
WHERE i.n >= 0
AND i.n < 31
ORDER BY i.n
) cal
Some notes:
The original query displays an outer ( LEFT
) join, but the equality predicate in the WHERE clause negates the "grace" of the join, which is equivalent to an inner join.
Some of the column references in the query are undefined. The best practice is to qualify all column references, then the reader can figure out which columns are coming from those tables without requiring the reader to know which columns are in the tables. This also protects the assertion from future hacking (with an "ambiguous column" error) when a column with the same name is added to another table specified in the query.)
Followup
Personally, for a limited number of date values, I would go with an inline view that does not reference the table. I would ask for PHP code for this request.
With a start date, say "2015-04-10", I would take that date value and format it in a query, which is equivalent to:
$cal = "SELECT DATE('2015-04-10') AS dt" ;
I would then loop through and increment that value by 1 day. Each time through the loop I add the $cal
next date to the pick, the network effect of running through the loop three times is equivalent to this:
$cal .= " UNION ALL SELECT DATE('2015-04-11')";
$cal .= " UNION ALL SELECT DATE('2015-04-12')";
$cal .= " UNION ALL SELECT DATE('2015-04-13')";
As a less attractive alternative, we can keep repeating the same start date value and just increment the integer value, and MySQL does the date math for us.
$cal .= " UNION ALL SELECT '2015-04-10' + INTERVAL 1 DAY";
$cal .= " UNION ALL SELECT '2015-04-10' + INTERVAL 2 DAY";
$cal .= " UNION ALL SELECT '2015-04-10' + INTERVAL 3 DAY";
Then I just dragged the query $cal
into the SQL text as an inline view query. Something like that:
$sql = "SELECT cal.dt
, ( SELECT IFNULL(ROUND(SUM
,0) AS average_
FROM ( " . $cal . " ) cal
LEFT
JOIN item ON ... ";
Anyway, this is the approach I would take if it was for a limited number of date values ββ(a couple of dozen or so), and if I was only going to execute this query occasionally, without clogging the database server with this, repeating the query for every request.) If I was going to fix the server, I would create and maintain a real table cal
, instead of incurring the overhead of materializing the derived table for each query.
source to share
Do you have data for every single day in the range? If so, it is a bit of a complicated join operation, but very doable.
You can get the date ranges you want like this:
SELECT DISTINCT
DATE(trdatetime)- INTERVAL 30 DAY AS startdate,
DATE(trdatetime) AS enddateplus1
FROM inventory
WHERE trdatetime >= NOW() - INTERVAL 31 DAY
Debug this query. Take a look to make sure you are getting every date range you want.
Then you can join this in your business request like:
SELECT dates.startdate,
ROUND(SUM(OutCount)/30) AS 'average'
FROM (
SELECT DISTINCT
DATE(trdatetime)- INTERVAL 30 DAY AS startdate,
DATE(trdatetime) AS enddateplus1
FROM inventory
WHERE trdatetime >= NOW() - INTERVAL 31 DAY
) dates
LEFT JOIN inventory ON i.trdatetime >= dates.startdate
AND i.trdatetime < dates.enddateplus1
LEFT JOIN item ON i.itemcode = item.itemcode
WHERE item.subcategory = 'T-Shirts'
AND transactiontype like 'OUT_%'
GROUP BY dates.startdate
If your inventory data is sparse i.e. if you have no transactions for all days, then some rows will be missing in your request for dates.
There is a way to fill in those missing lines. But this is a pain in s. Read this for more information. http://www.plumislandmedia.net/mysql/filling-missing-data-sequences-cardinal-integers/
Note that it BETWEEN
works very poorly for filtering values DATETIME
or TIMESTAMP
.
source to share