How can I shorten this SQL query without a million AND / OR?
I have a table with 13 columns (one for the ID # row and another 12 - one for each month of the year). Each column will only contain numbers and I want to write multiple queries to check each row for certain conditions and return the id of any rows that match.
So far, I've been good at writing all basic SELECT queries, but now I've gotten a bit stuck in writing SELECT queries to check multiple conditions at once without writing millions of lines of code every time.
The current query I'm writing needs to check every couple of consecutive months (like Jan / Feb, Feb / Mar, etc.) to see how big the difference between them is and needs to return any rows where 2 lots consecutive months have a 20% difference between them, and the remaining pairs have a 10% difference.
For example, if there were 1000 in January and 1300 in February, then the difference is 30%, so 1 lot. Then, if we say that April was 1500, and May was 2100, then the difference is 40%, so the second batch. Then while every other pair (Feb / Mar, Mar / Apr, ..., Nov / Dec) has a 10% difference each, then this row should be returned.
Unfortunately, the only way I can get this to work is to manually check every possible possibility (which works), but not very good for recording similar queries.
Here's a shorthand version of what I have so far:
SELECT pkID
FROM dbo.tblMonthData
WHERE
((colFeb > colJan * 1.2 AND colMar > colFeb * 1.2) AND (colApr < colMar * 1.1 AND colMay < colApr * 1.1 AND colJun < colMay * 1.1 AND colJul < colJun * 1.1 AND colAug < colJul * 1.1 AND colSep < colAug * 1.1 AND colOct < colSep * 1.1 AND colNov < colOct * 1.1 AND colDec < colNov * 1.1))
OR ((colFeb > colJan * 1.2 AND colApr > colMar * 1.2) AND (colMar < colFeb * 1.1 AND colMay < colApr * 1.1 AND colJun < colMay * 1.1 AND colJul < colJun * 1.1 AND colAug < colJul * 1.1 AND colSep < colAug * 1.1 AND colOct < colSep * 1.1 AND colNov < colOct * 1.1 AND colDec < colNov * 1.1))
OR ((colFeb > colJan * 1.2 AND colMay > colApr * 1.2) AND (colMar < colFeb * 1.1 AND colApr < colMar * 1.1 AND colJun < colMay * 1.1 AND colJul < colJun * 1.1 AND colAug < colJul * 1.1 AND colSep < colAug * 1.1 AND colOct < colSep * 1.1 AND colNov < colOct * 1.1 AND colDec < colNov * 1.1))
...
OR ((colNov > colOct * 1.2 AND colDec > colNov * 1.2) AND (colFeb < colJan * 1.1 AND colMar < colFeb * 1.1 AND colApr < colMar * 1.1 AND colMay < colApr * 1.1 AND colJun < colMay * 1.1 AND colJul < colJun * 1.1 AND colAug < colJul * 1.1 AND colSep < colAug * 1.1 AND colOct < colSep * 1.1))
There are about 55 lines of OR statements in total to test all possible combinations of this. If I then went to request something like this (for example, return all Row ID #s where 2 pairs are greater than 50% and 4 pairs are less than 10%), that would involve writing another long query from scratch, checking every possible combination.
So I was wondering how can I rewrite this in a shorter version that might also be more reusable for similar queries?
source to share
So, as an alternative answer to @ Pieter's question and to illustrate how a simplified data structure can make your task easier, I suggest this:
create view tblEasy as (
select pkID, 1 as colMonth, colJan as colValue from tblMonthData
UNION
select pkID, 2 as colMonth, colFeb as colValue from tblMonthData
UNION
select pkID, 3 as colMonth, colMar as colValue from tblMonthData
UNION
select pkID, 4 as colMonth, colApr as colValue from tblMonthData
UNION
select pkID, 5 as colMonth, colMay as colValue from tblMonthData
UNION
select pkID, 6 as colMonth, colJun as colValue from tblMonthData
UNION
select pkID, 7 as colMonth, colJul as colValue from tblMonthData
UNION
select pkID, 8 as colMonth, colAug as colValue from tblMonthData
UNION
select pkID, 9 as colMonth, colSep as colValue from tblMonthData
UNION
select pkID, 10 as colMonth, colOct as colValue from tblMonthData
UNION
select pkID, 11 as colMonth, colNov as colValue from tblMonthData
UNION
select pkID, 12 as colMonth, colDec as colValue from tblMonthData
);
This makes it look similar to how I would first structure the table. Then it is easy to create pairs by comparing the value on colMonth
with that on colMonth + 1
.
I've made a fiddle to illustrate how the comparison can also be done in the view, and then the query itself, if that's obvious enough.
http://sqlfiddle.com/#!3/600f6/4
Please note that performance is poor due to the original table structure.
Update Since this is accepted as an answer, I'll add more details from the sqlfiddle.
Additional view for a preliminary calculation of the difference between consecutive months:
create view tblPairs as (
select t1.pkId , t1.colMonth as colStart, (t2.colValue * 100 / t1.colValue) as colPercentage
from tblEasy as t1
inner join tblEasy as t2
on t1.pkId = t2.pkId and t1.colMonth = t2.colMonth - 1);
Query to find where 2 months have more than 20% increase and the other 9 have less than 10%:
select distinct pkid
from tblPairs as t1
where 2 = (
select count(*)
from tblPairs as t2
where t2.pkid = t1.pkid
and colPercentage >= 120)
and 9 = (
select count(*)
from tblPairs as t2
where t2.pkid = t1.pkid
and colPercentage <= 110)
;
source to share
Do not use the table for comparison, and then revert to the original format if necessary .:
with
unpvt as (
select
YearNo
,case MonthNo when 1 then colJan
when 2 then colFeb
when 3 then colMar
when 4 then colApr
when 5 then colMay
when 6 then colJun
when 7 then colJul
when 8 then colAug
when 9 then colSep
when 10 then colOct
when 11 then colNov
when 12 then colDec
else 0
end as Value
,YearNo * 12 + MonthNo - 1 as PeriodNo
/* other columns */
-- from dbo.tblMonthData
from tblMonthData
cross join ( values
(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)
)months(MonthNo)
)
select
this.*,
this.Value - isnull(prev.Value,0) as Delta
from unpvt this
left join unpvt prev
on prev.PeriodNo = this.PeriodNo - 1
;
etc .. Depending on your version of SQL Server, you may have access to the UNPIVOT article to compress the source even further.
If you have a static NUMBERS table , then the first twelve rows from it can be used instead of initializing VALUE in a CROSS JOIN.
source to share
You can turn off your data row by row in cross-use by using the table-valued constructor that creates a derived table with two consecutive month values. In addition, you can perform calculations and count rows that match certain criteria using case and sum .
Your request might look something like this.
select MD.pkID
from dbo.tblMonthData as MD
cross apply (
select sum(case when P.Mon1 * 1.2 < P.Mon2 then 1 end),
sum(case when P.Mon1 * 1.1 > P.Mon2 then 1 end)
from (values(MD.colJan, MD.colFeb),
(MD.colFeb, MD.colMar),
(MD.colMar, MD.colApr),
(MD.colApr, MD.colMay),
(MD.colMay, MD.colJun),
(MD.colJun, MD.colJul),
(MD.colJul, MD.colAug),
(MD.colAug, MD.colSep),
(MD.colSep, MD.colOct),
(MD.colOct, MD.colNov),
(MD.colNov, MD.colDec)) as P(Mon1, Mon2)
) as C(MoreThan20, LessThan10)
where C.MoreThan20 = 2 and
C.LessThan10 = 9
source to share