Select rows with closest timestamp

I have a table that looks something like this, essentially contains a timestamp as well as some other columns:

WeatherTable
+---------------------+---------+----------------+      +
| TS                  | MonthET | InsideHumidity | .... |
+---------------------+---------+----------------+      |
| 2014-10-27 14:24:22 |       0 |             54 |      |
| 2014-10-27 14:24:24 |       0 |             54 |      |
| 2014-10-27 14:24:26 |       0 |             52 |      |
| 2014-10-27 14:24:28 |       0 |             54 |      |
| 2014-10-27 14:24:30 |       0 |             53 |      |
| 2014-10-27 14:24:32 |       0 |             55 |      |
| 2014-10-27 14:24:34 |       9 |             54 |      |
.......

      

I am trying to formulate a SQL query that returns all rows at a specific time span (no problem here) with some arbitrary granularity, for example every 15 seconds. The number is always specified in seconds, but is not limited to values ​​less than 60. To complicate matters, timestamps do not necessarily fall into the required granularity, so this is not a case of simply choosing a timestamp from 14:24:00, 14:24:15, 14:24 : 30, etc. - the string with the closest timestamp to each value should be included in the result.

For example, if the start time was given as 14:24:30, the end time was 14:32:00, and the granularity was 130, the ideal time is:

14:24:30
14:26:40
14:28:50
14:31:00

      

However, timestamps may not exist for each of these time periods, in which case the row with the nearest timestamp should be selected for each of these ideal timestamps. In the case of two time stamps that are equally distant from the ideal time stamp, select the earlier one.

The database is part of the web service, so for the time being I just ignore the granularity in the SQL query and filter out unwanted results (Java) later. However, this seems far from ideal in terms of memory consumption and performance.

Any ideas?

+3


source to share


2 answers


You can try doing it like this:

First, create a list of time_intervals. Using the stored procedure make_intervals

from Get a list of dates between two dates , create temporary tables calling it anyway:

call make_intervals(@startdate,@enddate,15,'SECOND');

      

Then you will get a table time_intervals

with one of two columns named interval_start

. Use this to find the closest timestamp for each interval anyway:



CREATE TEMPORARY TABLE IF NOT EXISTS time_intervals_copy
  AS (SELECT * FROM time_intervals);

SELECT
  time_intervals.interval_start,
  WeatherTable.*
FROM time_intervals
JOIN WeatherTable
  ON WeatherTable.TS BETWEEN @startdate AND @enddate
JOIN (SELECT
        time_intervals.interval_start AS interval_start,
        MIN(ABS(time_intervals.interval_start - WeatherTable.TS)) AS ts_diff
      FROM time_intervals_copy AS time_intervals
      JOIN WeatherTable
      WHERE WeatherTable.TS BETWEEN @startdate AND @enddate
      GROUP BY time_intervals.interval_start) AS min
  ON min.interval_start = time_intervals.interval_start AND
     ABS(time_intervals.interval_start - WeatherTable.TS) = min.ts_diff
GROUP BY time_intervals.interval_start;

      

This will find the closest timestamp for each time_interval. Note. Each row in WeatherTable

can be specified more than once if the interval used is less than half of the interval of the stored data (or something like that, you get a dot;)).

Note. I have not tested the queries, they are written from my head. Please adapt to your use case and fix any minor bugs that may be there ...

+3


source


For testing purposes, I have expanded your dataset to the following timestamps. The column in my database is called time_stamp

.

2014-10-27 14:24:24
2014-10-27 14:24:26
2014-10-27 14:24:28
2014-10-27 14:24:32
2014-10-27 14:24:34
2014-10-27 14:24:25
2014-10-27 14:24:32
2014-10-27 14:24:34
2014-10-27 14:24:36
2014-10-27 14:24:37
2014-10-27 14:24:39
2014-10-27 14:24:44
2014-10-27 14:24:47
2014-10-27 14:24:53

      

I've summarized this idea, but let me explain in more detail before providing a solution that I could work with.

Requirements are a +/- timestamp at a given time. Since we have to go in either direction, we want to take the timeframe and split it in half. Then -1/2 timeframe to +1/2 timeframe determines the "bit" to consider.

A buffer for a given time from a given start time in the interval is @seconds

then set by this MySQL statement:

((floor(((t1.time_stamp - @time_start) - (@seconds/2))/@seconds) + 1) * @seconds)

      

NOTE. The whole + trick exists, so we don't end up with a bin of -1 index (it will start at zero). All times are calculated from the start time to ensure work timeframes> = 60 seconds.



Inside each bin, we need to know the distance from the center of the bin for each timeframe. This is done by determining the number of seconds from the start and subtracting from the bin (then taking an absolute value).

At this stage, we then "bin" all the time and order in the basket.

To filter these results, we will LEFT JOIN

set the conditions for deleting unwanted rows into the same table. When LEFT JOIN

ed, the desired rows will match NULL

in the LEFT JOIN

ed table .

I have a rather hacky one, replaced start, end and seconds with variables, but just for readability. MySQL-style remarks are included in the clause LEFT JOIN

ON

defining the conditions.

SET @seconds = 7;
SET @time_start = TIMESTAMP('2014-10-27 14:24:24');
SET @time_end = TIMESTAMP('2014-10-27 14:24:52');

SELECT t1.*
FROM temp t1
LEFT JOIN temp t2 ON
  #Condition 1: Only considering rows in the same "bin"
  ((floor(((t1.time_stamp - @time_start) - (@seconds/2))/@seconds) + 1) * @seconds)
 = ((floor(((t2.time_stamp - @time_start) - (@seconds/2))/@seconds) + 1) * @seconds)
AND
(
  #Condition 2 (Part A): "Filter" by removing rows which are greater from the center of the bin than others
  abs(
      (t1.time_stamp - @time_start)
      - (floor(((t1.time_stamp - @time_start) - (@seconds/2))/@seconds) + 1) * @seconds
  )
  > 
  abs(
      (t2.time_stamp - @time_start)
      - (floor(((t2.time_stamp - @time_start) - (@seconds/2))/@seconds) + 1) * @seconds
  )
  OR
  #Condition 2 (Part B1): "Filter" by removing rows which are the same distance from the center of the bin
  (
    abs(
        (t1.time_stamp - @time_start)
        - (floor(((t1.time_stamp - @time_start) - (@seconds/2))/@seconds) + 1) * @seconds
    )
    =
    abs(
        (t2.time_stamp - @time_start)
        - (floor(((t2.time_stamp - @time_start) - (@seconds/2))/@seconds) + 1) * @seconds
    )
    #Condition 2 (Part B2): And are in the future from the other match
    AND
      (t1.time_stamp - @time_start)
      >
      (t2.time_stamp - @time_start)
  )
)
WHERE t1.time_stamp - @time_start >= 0
AND @time_end - t1.time_stamp >= 0
#Condition 3: All rows which have a match are undesirable, so those 
#with a NULL for the primary key (in this case temp_id) are selected
AND t2.temp_id IS NULL

      

There might be a more concise way to write the query, but it filtered the results down to what was needed with one notable exception - I purposefully insert a duplicate record. This query will return both of these records that match the criteria specified.

+1


source







All Articles