How to count people in difficult groups by age / gender / etc?
I got the following table Patients
.
HospitalId INT,
GenderId BIT,
Age TINYINT,
DiseaseId SMALLINT
GenderId
= 0 is Male
GenderId
= 1 is feminine
HospitalA has HospitalId
0
HospitalB has HospitalId
1
Here's the output I want to generate:
DiseaseId | HospitalA_Male_18-30 | HospitalA_Male_31-40 |
---------------------------------------------------------
0 | (count here) | (count here) |
1 | (count here) | (count here) |
2 | (count here) | (count here) |
3 | (count here) | (count here) |
(columns continue)
HospitalA_Female_18-30 | HospitalA_Female_31-40 |
-------------------------------------------------
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(columns continue)
HospitalB_Male_18-30 | HospitalB_Male_31-40 |
---------------------------------------------
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(columns continue)
HospitalB_Female_18-30 | HospitalB_Female_31-40 |
-------------------------------------------------
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(9 columns in the result set)
So, as you can see, I really need to calculate, for each disease, how many patients have the disease in each specific group (by hospital, by gender and by age).
How can such grouping be done (most efficiently) in T-SQL?
source to share
You can do this with a pivot query:
select * from
(
select diseaseid,
'Hospital'
+ case hospitalid when 0 then 'A' when 1 then 'B' end
+ '_'
+ case genderid when 1 then 'Female' else 'Male' end
+ '_'
+ case when age between 18 and 30
then '18-30'
else (case when age between 31 and 40 then '31-40' end)
end Title,
1 Cnt
from Patients
where age between 18 and 40
) t
pivot (
count (Cnt) for Title in (
[HospitalA_Male_18-30], [HospitalA_Male_31-40],
[HospitalA_Female_18-30], [HospitalA_Female_31-40],
[HospitalB_Male_18-30], [HospitalB_Male_31-40],
[HospitalB_Female_18-30], [HospitalB_Female_31-40]
)
) as Q
UPDATE
As a development of the above solution, you can also transfer parts of the names from CASE expressions to your own virtual tables and join the table Patients
to them:
;with
hospital (hospitalid, hospitalname) as (
select 0, 'HospitalA' union all
select 1, 'HospitalB'
),
gender (genderid, gendername) as (
select 0, 'Male' union all
select 1, 'Female'
),
agerange (agefrom, ageto) as (
select 18, 30 union all
select 31, 40
)
select * from
(
select p.diseaseid,
h.hospitalname + '_' + g.gendername + '_'
+ rtrim(a.agefrom) + '-' + rtrim(a.ageto) as Title,
1 Cnt
from Patients p
inner join hospital h on p.hospitalid = h.hospitalid
inner join gender g on p.genderid = g.genderid
inner join agerange a on p.age between a.agefrom and a.ageto
where p.age between 18 and 40
) t
pivot (
count (Cnt) for Title in (
[HospitalA_Male_18-30], [HospitalA_Male_31-40],
[HospitalA_Female_18-30], [HospitalA_Female_31-40],
[HospitalB_Male_18-30], [HospitalB_Male_31-40],
[HospitalB_Female_18-30], [HospitalB_Female_31-40]
)
) as Q
The overhead of adding subqueries and joins is offset by the greater ease of maintenance:
-
part of the data (meta) is separated from the logical part;
-
name part lists are more convenient to expand as needed;
-
the concatenation expression is easier to change if you need to change the format of the target column names.
source to share
Please, try
SELECT
DiseaseId,
SUM(CASE WHEN HospitalId = 0 AND GenderId=0 AND (Age BETWEEN 18 AND 30) THEN 1 ELSE 0 END) AS [HospitalA_Male_18-30],
SUM(CASE WHEN HospitalId = 0 AND GenderId=0 AND (Age BETWEEN 31 AND 40) THEN 1 ELSE 0 END) AS [HospitalA_Male_31-40],
SUM(CASE WHEN HospitalId = 0 AND GenderId=1 AND (Age BETWEEN 18 AND 30) THEN 1 ELSE 0 END) AS [HospitalA_Female_18-30],
......
FROM Patients
GROUP BY DiseaseId
ORDER BY DiseaseId
source to share