SQL query to compute calculations from different rows of the same table on SQL server

I need a SQL query for the following. I am new to SQL. And the table below is just an example of what data I have. I have very large data of about 30 million rows and would like to write a query to get the table below.

   Id        type        data          time
-----------------------------------------------------------
    1          30          3.9          15:50:10.660555
    1          30          4.0          15:50:10.660777
    1          70          11.5         15:50:10.797966
    1          30          4.1          15:50:10.834444
    1          70          12.6         15:50:10.853114
    1          70          16.7         15:50:10.955086
    1          30          5            15:50:10.99
    11         30          3.8          15:50:11.660555
    11         30          4.1          15:50:11.660777
    11         70          12.5         15:50:11.797966
    11         30          4.7          15:50:11.834444
    11         70          12.68        15:50:11.853114
    11         70          16.76        15:50:11.955086
    11         30          5.1          15:50:11.99

      

I have a table as above. For each type 70 I need to compute something with the last known type 30. For example, for Id = 1 for the first type = 70 data at 15: 50: 10.797966 I need to get data type = 30 at 15: 50: 10.660777, so i can calculate the result = 11.5 / 4.0. Likewise, for type = 70 at 15: 50: 10.853114, I need data like = 30 at 15:50: 10.834444, so my result is 12.6 / 4.1.

I want the result to look like this:

Id          type           result             time
------------------------------------------------------
1            70             11.5/4.0        15:50:10.797966
1            70             12.6/4.1        15:50:10.853114
1            70             16.7/4.1        15:50:10.955086
11           70             12.5/4.1        15:50:11.797966
11           70             12.68/4.7       15:50:11.853114
11           70             16.76/4.7       15:50:11.955086

      

I would like to be able to execute these SQL queries in python using pyodbc.

Any help would be greatly appreciated! Thanks in advance!

+3


source to share


2 answers


Assuming at least one type = 30 lines preceding type = 70 by id, you can do this by using outer apply

getting the max

time for type = 30 before each line of type = 70 and using that value to divide.

SELECT x.id,
       x.type,
       x.time,
       x.data*1.0/t.data as result
FROM
  (SELECT t.*,t1.maxtime_before
   FROM t 
   OUTER APPLY
     (SELECT max(time) AS maxtime_before
      FROM t t1
      WHERE t1.id=t.id AND t1.type=30 AND t1.time<t.time) t1
   WHERE type = 70
  ) x
JOIN t ON t.id=x.id AND t.time=x.maxtime_before

      

If there is no row with type = 30 preceding row with type = 70, you can show the value null

for that time in the result column using

WITH x AS
  (SELECT t.*,
          t1.maxtime_before
   FROM t
   OUTER APPLY
     (SELECT max(time) AS maxtime_before
      FROM t t1
      WHERE t1.id=t.id AND t1.type=30 AND t1.time<t.time) t1
   WHERE type = 70
  )
SELECT x.id,
       x.type,
       x.time,
       x.data*1.0/t.data as resullt
FROM t
JOIN x ON t.id=x.id AND t.time=x.maxtime_before
UNION ALL
SELECT id,
       type,
       time,
       NULL
FROM x
WHERE maxtime_before IS NULL

      



Sample Demo

Another way to do this is to use a window function max

to keep track of the current maximum time type = 30 lines per id.

WITH x AS
  (SELECT t.*,
          MAX(CASE WHEN type=30 THEN time END) OVER(PARTITION BY id ORDER BY time) AS running_max
   FROM t
  )
SELECT x.id,
       x.type,
       x.time,
       x.data*1.0/t.data as result
FROM x
JOIN t ON t.id=x.id AND t.time=x.running_max
WHERE x.type=70
UNION ALL
SELECT id,
       type,
       time,
       NULL
FROM x 
WHERE running_max IS NULL

      

+1


source


You can only do this with window functions.

For each line, get the previous type and value. Also, list the 70s in such a way that you can identify them as a group (you can do this with the grand total).

In the next step, use the partitioned max to get the type and finally do your calculation.



select t.*,
       data / data_30 as result
from (select t.*,
             max(case when prev_type = 30 then prev_data end) over (partition by id, grp) as data_30
      from (select t.*,
                   sum(case when type <> 70 then 1 else 0 end) over (partition by id order by time) as grp,
                   lag(type) over (partition by id order by time) as prev_type,
                   lag(data) over (partition by id order by time) as prev_data
            from t
            where type in (30, 70)
           ) t
     ) t;

      

One interesting aspect of this. By limiting types to only 30 and 70, we ensure that each group of 70s is preceded by 30.

+1


source







All Articles