Select for averages based on different date ranges in one MySQL query

Question

Select for averages based on different date ranges in one MySQL query

Basically I am trying to plot a chart with this data. I can put my query in a while loop in PHP to get the average, but I would prefer it done with a single query creating one table of results.

<?php 

date_default_timezone_set('America/Los_Angeles');

include('../connect.php');

$subcategory = 'T-Shirts';

$date = date('Y-m-d', strtotime('-29 days'));
$today = date("Y-m-d");

$subcategory = mysqli_real_escape_string($conp, $subcategory);

echo "<table border=\"1\">";
echo "<tr>";
echo "<th>date</th>";
echo "<th>average</th>";
echo "</tr>";

while (strtotime($date) <= strtotime($today)) {

    $from_date = date ("Y-m-d", strtotime("-29 day", strtotime($date)));

    $query = $conp->query("SELECT ROUND(SUM(OutCount)/30) AS 'average' FROM inventory
    LEFT JOIN item
    ON inventory.itemcode = item.itemcode
    WHERE item.subcategory = '$subcategory'
    AND TrDateTime BETWEEN '$from_date' AND '$date' AND transactiontype like 'OUT_%'"); 

    if($query->num_rows){       
        while($row = mysqli_fetch_array($query, MYSQL_ASSOC)){                      
            if(!empty($row['average'])){
                $average = $row['average'];
            }else{
                $average = "N/A";
            }
        }                       
        mysqli_free_result($query);                             
    }else{
        $average = "N/A";
    }

    $date = date ("Y-m-d", strtotime("+1 day", strtotime($date)));

    echo "<tr>";
    echo "<td>" . $date . "</td>";
    echo "<td>" . $average . "</td>";
    echo "</tr>";
}

echo "</table>";

?>

I get all dates for the last 30 days (including today) and average sales 29 days before that date.

+------------+----------+  
| date       | average  |  
+------------+----------+  
| 2015-04-09 | 222      |  
| 2015-04-10 | 225      |  
| 2015-04-11 | 219      |  
| ...        | ...      |  
+------------+----------+

I can get everything I need, but 29 queries work in this situation and MySQL will be significantly faster. I started to come up with a MySQL procedure, but I'm not sure how well it will work when I try to call it using PHP.

DELIMITER //
    CREATE PROCEDURE average_daily_sales()
    BEGIN

        SET @today = CURDATE();
        SET @date_var = CURDATE() - INTERVAL 29 DAY;
        SET @from_date = @date_var - INTERVAL 29 DAY;
        SET @to_date = @from_date + INTERVAL 29 DAY;

        label1: WHILE @date_var < @today DO

            SELECT      DATE_FORMAT(trdatetime, '%Y-%m-%d') as 'date', ROUND(SUM(OutCount)/30) AS 'average'
            FROM        inventory
            LEFT JOIN   item
            ON          inventory.itemcode = item.itemcode
            WHERE       item.subcategory = 'T-Shirts'
            AND         trdatetime BETWEEN @from_date - INTERVAL 29 DAY AND @to_date
            AND         transactiontype like 'OUT_%';

            SET @date_var = @date_var + INTERVAL 1 DAY;

        END WHILE label1;    

    END; //
DELIMITER ;

Ultimately, I would prefer a regular MySQL statement that I can use to create the results table I want in one shot. Any help would be greatly appreciated.

+3

php mysql datetime mysqli aggregate-functions

Justin emmons May 07 '15 at 19:49

source to share

3 answers

If you create a calendar table and populate it with a range of date values like

CREATE TABLE cal (dt DATE NOT NULL PRIMARY KEY) ;
INSERT INTO cal VALUES ('2015-04-01'),('2015-04-02'),('2015-04-03'), ... ;

you can use this as a string source in a query like:

SELECT cal.dt
     , ( -- correlated subquery references value returned from cal
         SELECT ROUND(SUM(n.OutCount)/30)
           FROM inventory n
           JOIN item t
             ON t.itemcode = n.itemcode
          WHERE t.subcategory = 'foo'
            AND n.TrDateTime >= cal.dt + INTERVAL -28 DAY
            AND n.TrDateTime <  cal.dt + INTERVAL 1 DAY
            AND n.transactiontype LIKE 'OUT_%'
       ) AS `average`
  FROM cal
 WHERE cal.dt >= '2015-04-01'
   AND cal.dt <  '2015-05-01'
 ORDER BY cal.dt

It is not necessary to create a calendar table cal

. We could use an inline view and give it an alias cal

. For example, in the query above, we could replace this line:

  FROM cal

with this:

  FROM ( SELECT DATE('2015-04-01') AS dt
         UNION ALL SELECT DATE('2015-04-02')
         UNION ALL SELECT DATE('2015-04-03')
         UNION ALL SELECT DATE('2015-04-04')
         UNION ALL SELECT DATE('2015-04-05')
       ) cal

Or, if you have a string source that can give you a contiguous sequence of integers starting at zero, you can create your date values from the base date, for example

   FROM ( SELECT '2014-04-01' + INTERVAL i.n DAY
            FROM source_of_integers i
           WHERE i.n >= 0
             AND i.n < 31
           ORDER BY i.n
        ) cal

Some notes:

The original query displays an outer ( LEFT

) join, but the equality predicate in the WHERE clause negates the "grace" of the join, which is equivalent to an inner join.

Some of the column references in the query are undefined. The best practice is to qualify all column references, then the reader can figure out which columns are coming from those tables without requiring the reader to know which columns are in the tables. This also protects the assertion from future hacking (with an "ambiguous column" error) when a column with the same name is added to another table specified in the query.)

Followup

Personally, for a limited number of date values, I would go with an inline view that does not reference the table. I would ask for PHP code for this request.

With a start date, say "2015-04-10", I would take that date value and format it in a query, which is equivalent to:

$cal = "SELECT DATE('2015-04-10') AS dt" ;

I would then loop through and increment that value by 1 day. Each time through the loop I add the $cal

next date to the pick, the network effect of running through the loop three times is equivalent to this:

$cal .= " UNION ALL SELECT DATE('2015-04-11')";
$cal .= " UNION ALL SELECT DATE('2015-04-12')";
$cal .= " UNION ALL SELECT DATE('2015-04-13')";

As a less attractive alternative, we can keep repeating the same start date value and just increment the integer value, and MySQL does the date math for us.

$cal .= " UNION ALL SELECT '2015-04-10' + INTERVAL 1 DAY";
$cal .= " UNION ALL SELECT '2015-04-10' + INTERVAL 2 DAY";
$cal .= " UNION ALL SELECT '2015-04-10' + INTERVAL 3 DAY";

Then I just dragged the query $cal

into the SQL text as an inline view query. Something like that:

$sql = "SELECT cal.dt
             , ( SELECT IFNULL(ROUND(SUM
                 ,0) AS average_
          FROM ( " . $cal . " ) cal
          LEFT
          JOIN item ON ... ";

Anyway, this is the approach I would take if it was for a limited number of date values (a couple of dozen or so), and if I was only going to execute this query occasionally, without clogging the database server with this, repeating the query for every request.) If I was going to fix the server, I would create and maintain a real table cal

, instead of incurring the overhead of materializing the derived table for each query.

+2

spencer7593 May 07 '15 at 20:22

source to share

Do you have data for every single day in the range? If so, it is a bit of a complicated join operation, but very doable.

You can get the date ranges you want like this:

        SELECT DISTINCT
               DATE(trdatetime)- INTERVAL 30 DAY AS startdate,
               DATE(trdatetime)                  AS enddateplus1
          FROM inventory
         WHERE trdatetime >= NOW() - INTERVAL 31 DAY

Debug this query. Take a look to make sure you are getting every date range you want.

Then you can join this in your business request like:

  SELECT dates.startdate, 
         ROUND(SUM(OutCount)/30) AS 'average'
   FROM (
        SELECT DISTINCT
               DATE(trdatetime)- INTERVAL 30 DAY AS startdate,
               DATE(trdatetime)                  AS enddateplus1
          FROM inventory
         WHERE trdatetime >= NOW() - INTERVAL 31 DAY
        ) dates
   LEFT JOIN inventory  ON i.trdatetime >= dates.startdate
                       AND i.trdatetime <  dates.enddateplus1 
   LEFT JOIN  item ON  i.itemcode = item.itemcode
  WHERE item.subcategory = 'T-Shirts'
    AND transactiontype like 'OUT_%'
  GROUP BY dates.startdate

If your inventory data is sparse i.e. if you have no transactions for all days, then some rows will be missing in your request for dates.

There is a way to fill in those missing lines. But this is a pain in s. Read this for more information. http://www.plumislandmedia.net/mysql/filling-missing-data-sequences-cardinal-integers/

Note that it BETWEEN

works very poorly for filtering values DATETIME

or TIMESTAMP

.

+2

O. Jones May 07 '15 at 20:23

source to share

Justin emmons · Accepted Answer · 2015-05-08T17:35:20+0000

In the suggestions from @OllieJones and @ spencer7593 either required a "transaction" every day, in order to use SELECT DISTINCT DATE(trdatetime)

, you needed to create another table, or you needed to create a derived table.

SELECT DISTINCT DATE(trdatetime)

was not an option for me, because I did not have transactions for every day.

The hybrid PHP and MySQL example that @ spencer7593 suggested would generate the derived table very well. As a result, the static version took about 1.8 seconds to get the result. The problem is that you will need additional PHP to generate this ... (see @ Spencer7593's answer)

SELECT cal.dt
     , ( -- correlated subquery references value returned from cal
         SELECT ROUND(SUM(n.OutCount)/30)
           FROM inventory n
           JOIN item t
             ON t.itemcode = n.itemcode
          WHERE t.subcategory = 'foo'
            AND n.TrDateTime >= cal.dt + INTERVAL -28 DAY
            AND n.TrDateTime <  cal.dt + INTERVAL 1 DAY
            AND n.transactiontype LIKE 'OUT_%'
       ) AS `average`
  FROM ( SELECT DATE('2015-04-01') AS dt
        UNION ALL SELECT DATE('2015-04-02')
        UNION ALL SELECT DATE('2015-04-03')
        UNION ALL SELECT DATE('2015-04-04')
        UNION ALL SELECT DATE('2015-04-05')
        UNION ALL SELECT DATE('2015-04-06')
etc...
       ) cal
 WHERE cal.dt >= '2015-04-01'
   AND cal.dt <  '2015-05-01'
 ORDER BY cal.dt

I am trying to use another answer by @ spencer7593. I created a table "source of integers" with numbers 0-31 as he suggested. This method took just over 1.8 seconds.

SELECT cal.sd, cal.ed
     , ( -- correlated subquery references value returned from cal
         SELECT ROUND(SUM(n.OutCount)/30)
           FROM inventory n
           JOIN item t
             ON t.itemcode = n.itemcode
          WHERE t.subcategory = 'foobar'
            AND n.TrDateTime >= cal.ed + INTERVAL -30 DAY
            AND n.TrDateTime <  cal.ed + INTERVAL 1 DAY
            AND n.transactiontype LIKE 'OUT_%'
       ) AS `average`
  FROM ( SELECT (CURDATE() + INTERVAL -30 DAY) + INTERVAL i.n DAY as `ed`, (((CURDATE() + INTERVAL -30 DAY) + INTERVAL i.n DAY) + INTERVAL - 30 DAY) as `sd`
            FROM source_of_integers i
           WHERE i.n >= 0
             AND i.n < 31
           ORDER BY i.n
        ) cal
WHERE cal.ed >= CURDATE() + INTERVAL -29 DAY
   AND cal.ed <=  CURDATE()
 ORDER BY cal.ed;

For these dates you need the source of the strings, it doesn't actually happen. I ended up making a cal.

CREATE TABLE cal (
    dt DATE NOT NULL PRIMARY KEY
);

CREATE TABLE ints ( i tinyint );

INSERT INTO ints VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

INSERT INTO cal (dt)
SELECT DATE('2010-01-01') + INTERVAL a.i*10000 + b.i*1000 + c.i*100 + d.i*10 + e.i DAY
FROM ints a JOIN ints b JOIN ints c JOIN ints d JOIN ints e
WHERE (a.i*10000 + b.i*1000 + c.i*100 + d.i*10 + e.i) <= 3651
ORDER BY 1;

And then followed a slightly modified version of @ spencer7593's answer on it.

SELECT cal.dt
     , ( -- correlated subquery references value returned from cal
         SELECT ROUND(SUM(n.OutCount)/30)
           FROM inventory n
           JOIN item t
             ON t.itemcode = n.itemcode
          WHERE t.subcategory = 'foo'
            AND n.TrDateTime >= cal.dt + INTERVAL -28 DAY
            AND n.TrDateTime <  cal.dt + INTERVAL 1 DAY
            AND n.transactiontype LIKE 'OUT_%'
       ) AS `average`
  FROM cal
WHERE cal.dt >= CURDATE() + INTERVAL -30 DAY
    AND cal.dt <  CURDATE()
ORDER BY cal.dt;

In my opinion, I believe this is the cleanest (less PHP) and most efficient answer.

Here's how I indexed the inventory table to speed it up significantly:

ALTER TABLE inventory ADD KEY (ItemCode, TrDateTime, TransactionType);

Thanks @OllieJones and @ spencer7593 for your help!

Select for averages based on different date ranges in one MySQL query

More articles: