Impala Query: Combine Multiple COUNT DISTINCT WHERE
On impala, I'm trying to count the number of distinct sample IDs that contain "101", "102", or "103".
Here's an example of my data:
| sample_id |
___________
| 101-345-5 |
| 101-345-6 |
| 101-345-6 |
| 102-345-5 |
| 103-345-5 |
| 103-345-8 |
| 103-345-8 |
I would like to get the number of different sample IDs in each study group:
| Study | Count |
_______________
| 101 | 2 |
| 102 | 1 |
| 103 | 2 |
I can easily create separate queries to find the number in each group:
SELECT COUNT(DISTINCT ill.sample_id) as 101_count
FROM illumina_data ill
WHERE SUBSTRING(ill.sample_id, 1,3) = "101"
But I would really like to figure out how to use a single query to find the score for 101, 102 and 103.
+3
source to share
1 answer
The group will do the trick http://sqlfiddle.com/#!9/1d75f/6
SELECT SUBSTRING(sample, 1,3) , COUNT(DISTINCT sample)
FROM samples
group by SUBSTRING(sample, 1,3);
+2
source to share