SQL group by date
I have read several topics about sequence group, this is almost what I need, but I could not find a solution for my problem.
I have a table like this:
PlanificatorPozitieID JalonID DataStart DataFinal
--------------------- ----------- ----------------------- -----------------------
26 46 2012-05-21 00:00:00.000 2012-05-31 00:00:00.000
28 48 2012-06-01 00:00:00.000 2012-06-01 00:00:00.000
27 60 2012-06-02 00:00:00.000 2012-06-02 00:00:00.000
29 60 2012-06-07 00:00:00.000 2012-06-08 00:00:00.000
37 60 2012-06-08 00:00:00.000 2012-06-10 00:00:00.000
30 65 2012-06-10 00:00:00.000 2012-06-13 00:00:00.000
31 65 2012-06-18 00:00:00.000 2012-06-24 00:00:00.000
32 65 2012-06-23 00:00:00.000 2012-07-01 00:00:00.000
33 66 2012-07-02 00:00:00.000 2012-07-02 00:00:00.000
34 66 2012-07-02 00:00:00.000 2012-07-05 00:00:00.000
36 66 2012-07-06 00:00:00.000 2012-07-10 00:00:00.000
Desired output:
PlanificatorPozitieID JalonID DataStart DataFinal
--------------------- ----------- ----------------------- -----------------------
26 46 2012-05-21 00:00:00.000 2012-05-31 00:00:00.000
28 48 2012-06-01 00:00:00.000 2012-06-01 00:00:00.000
27 60 2012-06-02 00:00:00.000 2012-06-02 00:00:00.000
29 60 2012-06-07 00:00:00.000 2012-06-10 00:00:00.000
30 65 2012-06-10 00:00:00.000 2012-06-13 00:00:00.000
31 65 2012-06-18 00:00:00.000 2012-07-01 00:00:00.000
33 66 2012-07-02 00:00:00.000 2012-07-05 00:00:00.000
36 66 2012-07-06 00:00:00.000 2012-07-10 00:00:00.000
So I need to group the JalonID, but the group should only be created if DataFinal> = DataStart. I want to get the time interval of each JalonID, but I only want to get the periods that have no pause time.
Home I made it clear.
select MIN(pp.DataStart) as DataStart, MAX(pp.DataFinal) as DataFinal, pp.JalonID FROM #PlanPozitii pp
GROUP BY pp.JalonID
But this query does not satisfy my condition to group by periods that are continuous.
How to clarify. Let's take the following example
30 65 2012-06-10 00:00:00.000 2012-06-13 00:00:00.000
31 65 2012-06-18 00:00:00.000 2012-06-24 00:00:00.000
32 65 2012-06-23 00:00:00.000 2012-07-01 00:00:00.000
2012-06-13 00:00:00.000
< 2012-06-18 00:00:00.000
, so there will be no group between PlanificatorPozitieID 30
and 31
. But 2012-06-24 00:00:00.000
> 2012-06-23 00:00:00.000
so there will now be a group between PlanificatorPozitieID 31
and 32
.
So from these three lines, we get two lines.
30 65 2012-06-10 00:00:00.000 2012-06-13 00:00:00.000
31 65 2012-06-18 00:00:00.000 2012-07-01 00:00:00.000
DECLARE @YourTable TABLE(PlanificatorPozitieID INT, JalonID INT,DataStart DATETIME, DataFinal DATETIME)
INSERT INTO @YourTable VALUES
(39,1223,'2015-02-16 00:00:00.000','2015-02-20 00:00:00.000'),
(43,1223,'2015-02-19 00:00:00.000','2015-02-24 00:00:00.000'),
(40,1223,'2015-02-23 00:00:00.000','2015-02-27 00:00:00.000'),
(42,1223,'2015-03-09 00:00:00.000','2015-03-13 00:00:00.000')
;WITH cte AS
(
SELECT a.PlanificatorPozitieID,
a.JalonID,
a.DataStart,
COALESCE(b.DataFinal,a.datafinal) AS [DataFinal],
ROW_NUMBER() OVER (PARTITION BY a.JalonID ORDER BY DATEDIFF(dd,a.datastart, COALESCE(b.DataFinal,a.datafinal))) [rn],
COUNT(*) OVER (PARTITION BY a.JalonID) [cnt]
FROM @YourTable a
LEFT JOIN @YourTable b
ON a.JalonID = b.JalonID AND
b.DataStart BETWEEN a.DataStart AND a.DataFinal AND
a.PlanificatorPozitieID <> b.PlanificatorPozitieID AND
DATEDIFF(dd,a.DataStart,a.DataFinal) < DATEDIFF(dd,a.DataStart,b.DataFinal)
)
SELECT *
FROM cte
WHERE rn= 1 OR rn=cnt
Result:
PlanificatorPozitieID JalonID DataStart DataFinal rn cnt
--------------------- ----------- ----------------------- ----------------------- -------------------- -----------
40 1223 2015-02-23 00:00:00.000 2015-02-27 00:00:00.000 1 4
43 1223 2015-02-19 00:00:00.000 2015-02-27 00:00:00.000 4 4
Expected Result:
PlanificatorPozitieID JalonID DataStart DataFinal
--------------------- ----------- ----------------------- -----------------------
39 1223 2015-02-16 00:00:00.000 2015-02-27 00:00:00.000
42 1223 2015-03-09 00:00:00.000 2015-03-13 00:00:00.000
source to share
I found solutin, it is not very efficient as it uses 2 cursors. But it works if someone needs an example
DROP TABLE #DateTEst
DROP TABLE #pozitii
drop table #PozitiiJaloaneStandard
CREATE TABLE #DateTest (JalonStandardID int,DataStart datetime,DataFinal datetime)
INSERT INTO #DateTest VALUES (1,'2015-05-05','2015-05-08')
INSERT INTO #DateTest VALUES (1,'2015-05-09','2015-05-13')
INSERT INTO #DateTest VALUES (1,'2015-05-12','2015-05-15')
INSERT INTO #DateTest VALUES (1,'2015-05-16','2015-05-18')
INSERT INTO #DateTest VALUES (1,'2015-05-14','2015-05-19')
INSERT INTO #DateTest VALUES (2,'2015-05-05','2015-05-06')
INSERT INTO #DateTest VALUES (2,'2015-05-06','2015-05-07')
INSERT INTO #DateTest VALUES (2,'2015-05-06','2015-05-09')
INSERT INTO #DateTest VALUES (3,'2015-05-05','2015-05-07')
INSERT INTO #DateTest VALUES (3,'2015-05-08','2015-05-10')
INSERT INTO #DateTest VALUES (4,'2015-05-05','2015-05-08')
INSERT INTO #DateTest VALUES (5,'2015-05-07','2015-05-07')
INSERT INTO #DateTest VALUES (5,'2015-05-08','2015-05-08')
INSERT INTO #DateTest VALUES (5,'2015-05-09','2015-05-12')
INSERT INTO #DateTest VALUES (5,'2015-05-11','2015-05-12')
INSERT INTO #DateTest VALUES (6,'2015-05-05','2015-05-20')
INSERT INTO #DateTest VALUES (6,'2015-05-15','2015-05-18')
CREATE TABLE #Pozitii (DataStart datetime, DataFinal datetime)
CREATE TABLE #PozitiiJaloaneStandard ( JalonStandardID int, DataStart datetime, DataFinal datetime)
Declare @JalonStandarID int
DEclare @PlanificatorPozitieID int
Declare @DataStartPozitie datetime
Declare @DataFinalPozitie datetime
DEclare @DataStartMin datetime
Declare @PozitieMinima int
Declare @DataFinalMin datetime
Declare @DataStartJalonStandard datetime
Declare @DataFinalJalonStandard datetime
Declare Crs_JaloaneStandard Cursor For
Select JalonStandardID
From #DateTest
ORDER BY JalonStandardID
Open Crs_JaloaneStandard
Fetch Next From Crs_JaloaneStandard Into
@JalonStandarID
While @@Fetch_Status = 0
Begin
INSERT INTO #Pozitii
SELECT pp.DataStart,pp.DataFinal
FROM #DateTest pp
WHERE pp.JalonStandardID = @JalonStandarID
GROUP BY pp.DataStart,pp.DataFinal
SELECT @DataStartMin = MIN(DataStart) FROM #Pozitii
SELECT @DataFinalMin = DataFinal FROM
#Pozitii WHERE DataStart = @DataStartMin
Declare Crs_Pozitii Cursor For
SELECT
p.DataStart,p.DataFinal
FROM #Pozitii p
ORDER by p.DataStart ASC
Open Crs_Pozitii
Fetch Next From Crs_Pozitii Into
@DataStartPozitie,@DataFinalPozitie
while @@FETCH_STATUS = 0
begin
if (@DataFinalMin > @DataStartPozitie) and (@DataFinalMin <= @DataFinalPozitie )
begin
set @DataFinalMin = @DataFinalPozitie
end
if (@DataFinalMin <= @DataStartPozitie) begin
INSERT INTO #PozitiiJaloaneStandard VALUES (@JalonStandarID,@DataStartMin,@DataFinalMin)
set @DataFinalMin = @DataFinalPozitie
set @DataStartMin = @DataStartPozitie
print @DataStartPozitie
print @DataFinalPozitie
end
Fetch Next From Crs_Pozitii Into
@DataStartPozitie,@DataFinalPozitie
End
INSERT INTO #PozitiiJaloaneStandard VALUES (@JalonStandarID,@DataStartMin,@DataFinalMin)
DELETE FROM #Pozitii
Close Crs_Pozitii
Deallocate Crs_Pozitii
Fetch Next From Crs_JaloaneStandard Into
@JalonStandarID
End
Close Crs_JaloaneStandard
Deallocate Crs_JaloaneStandard
SELECT * FROM #PozitiiJaloaneStandard
GROUP BY JalonStandardID,DataStart,DataFinal
source to share
I don't know if this will work fine with actual data because I haven't tested it rigorously, but here's the solution:
DECLARE @YourTable TABLE(PlanificatorPozitieID INT, JalonID INT,DataStart DATETIME, DataFinal DATETIME)
INSERT INTO @YourTable VALUES
(26,46,'2012-05-21 00:00:00.000','2012-05-31 00:00:00.000'),
(28,48,'2012-06-01 00:00:00.000','2012-06-01 00:00:00.000'),
(27,60,'2012-06-02 00:00:00.000','2012-06-02 00:00:00.000'),
(29,60,'2012-06-07 00:00:00.000','2012-06-08 00:00:00.000'),
(37,60,'2012-06-08 00:00:00.000','2012-06-10 00:00:00.000'),
(30,65,'2012-06-10 00:00:00.000','2012-06-13 00:00:00.000'),
(31,65,'2012-06-18 00:00:00.000','2012-06-24 00:00:00.000'),
(32,65,'2012-06-23 00:00:00.000','2012-07-01 00:00:00.000'),
(33,66,'2012-07-02 00:00:00.000','2012-07-02 00:00:00.000'),
(34,66,'2012-07-02 00:00:00.000','2012-07-05 00:00:00.000'),
(36,66,'2012-07-06 00:00:00.000','2012-07-10 00:00:00.000')
;WITH cte AS
(
SELECT a.PlanificatorPozitieID,
a.JalonID,
a.DataStart,
COALESCE(b.DataFinal,a.datafinal) AS [DataFinal],
ROW_NUMBER() OVER (PARTITION BY a.JalonID ORDER BY DATEDIFF(dd,a.datastart, COALESCE(b.DataFinal,a.datafinal))) [rn],
COUNT(*) OVER (PARTITION BY a.JalonID) [cnt]
FROM @YourTable a
LEFT JOIN @YourTable b
ON a.JalonID = b.JalonID AND
b.DataStart BETWEEN a.DataStart AND a.DataFinal AND
a.PlanificatorPozitieID <> b.PlanificatorPozitieID AND
DATEDIFF(dd,a.DataStart,a.DataFinal) < DATEDIFF(dd,a.DataStart,b.DataFinal)
)
SELECT *
FROM cte
WHERE rn= 1 OR rn=cnt
source to share
https://msdn.microsoft.com/en-us/library/hh231256.aspx
I was looking for a LAG () function that allows you to access the previous lines in a query. My thought was to calculate the difference between the current row and the previous row in order to get the column to group with;
with a as (
select 1 as ID, 1 as A, 2 as B union all
select 1 as ID, 2 as A, 3 as B union all
select 1 as ID, 3 as A, 4 as B union all
select 1 as ID, 4 as A, 5 as B union all
select 1 as ID, 6 as A, 7 as B
)
select ID, A, B, A-LAG(B,1,0) OVER (order by ID) as koe from a where B > A
If you run this query, you get the result as:
ID A B koe
----------- ----------- ----------- -----------
1 1 2 1
1 2 3 0
1 3 4 0
1 4 5 0
1 6 7 1
Imagine A is DataStart and B is DataFinal and the calculated koe is the difference, as you can see it works on all but the first row ... The first row gets the difference between a non-existent row (so its 0). But this is the direction I would start trying.
source to share