How can I count users for a month that was absent in the month before?

I am trying to count the unique users on a monthly basis that were not there in the previous month. So if a user has a record for January and then another for February, then I would only count January for that user.

user_id    time
a1         1/2/17
a1         2/10/17
a2         2/18/17
a4         2/5/17
a5         3/25/17

      

My results should look like this:

Month   User Count
January     1
February    2
March       1

      

+3


source to share


3 answers


I'm not very familiar with BigQuery, but here's how I would solve the problem using TSQL. I assume that you can use similar logic in BigQuery.

1). First, order data by user_id, and then by time. In TSQL, you can accomplish this with the following and store it in the generic table expression that you will be querying in the step after.

;WITH cte AS
(
select  ROW_NUMBER() OVER (PARTITION BY [user_id] ORDER BY [time]) AS rn,*
from    dbo.employees
)

      

2). The following query is only for rows with rn = 1 (first occurrence for a specific user) and a group by month.



select    DATENAME(month, [time]) AS [Month], count(*) AS user_count 
from      cte
where     rn = 1    
group by  DATENAME(month, [time])

      

This assumes 2017 is the only year you are dealing with. If you're dealing with more than one year, you probably want step # 2 to look something like this:

select    year([time]) as [year],  DATENAME(month, [time]) AS [month], 
          count(*) AS user_count 
from      cte
where     rn = 1    
group by  year([time]), DATENAME(month, [time])

      

+1


source


First, aggregate user id and month. Then use lag()

to see if the user was present in the previous month:

with du as (
      select date_trunc(time, month) as yyyymm, user_id
      from t
      group by date_trunc(time, month)
     )
select yyyymm, count(*)
from (select du.*,
             lag(yyyymm) over (partition by user_id order by yyyymm) as prev_yyyymm
      from du
     ) du
where prev_yyyymm is not null or
      prev_yyyymm < date_add(yyyymm, interval 1 month)
group by yyyymm;

      



Note. This uses features date

, but similar features exist for timestamp

.

0


source


As I understand it, the question is to exclude a user from being counted in a given month only if the same user submitted in the previous month. But if the same user is present several months before the given one, but not in the previous one - the user should be counted.

If it is correct - try below for standard SQLQuery SQL

#standardSQL
SELECT Year, Month, COUNT(DISTINCT user_id) AS User_Count
FROM (
  SELECT *,
    DATE_DIFF(time, LAG(time) OVER(PARTITION BY user_id ORDER BY time), MONTH) AS flag
  FROM (
    SELECT 
      user_id,
      DATE_TRUNC(PARSE_DATE('%x', time), MONTH) AS time, 
      EXTRACT(YEAR FROM PARSE_DATE('%x', time)) AS Year, 
      FORMAT_DATE('%B', PARSE_DATE('%x', time)) AS Month
    FROM yourTable
    GROUP BY 1, 2, 3, 4
  )
)
WHERE IFNULL(flag, 0) <> 1
GROUP BY Year, Month, time
ORDER BY time 

      

you can test / play with the above example with dummy data from your question

#standardSQL
WITH yourTable AS (
  SELECT 'a1' AS user_id, '1/2/17' AS time UNION ALL
  SELECT 'a1', '2/10/17' UNION ALL
  SELECT 'a2', '2/18/17' UNION ALL
  SELECT 'a4', '2/5/17' UNION ALL
  SELECT 'a5', '3/25/17' 
)
SELECT Year, Month, COUNT(DISTINCT user_id) AS User_Count
FROM (
  SELECT *,
    DATE_DIFF(time, LAG(time) OVER(PARTITION BY user_id ORDER BY time), MONTH) AS flag
  FROM (
    SELECT 
      user_id,
      DATE_TRUNC(PARSE_DATE('%x', time), MONTH) AS time, 
      EXTRACT(YEAR FROM PARSE_DATE('%x', time)) AS Year, 
      FORMAT_DATE('%B', PARSE_DATE('%x', time)) AS Month
    FROM yourTable
    GROUP BY 1, 2, 3, 4
  )
)
WHERE IFNULL(flag, 0) <> 1
GROUP BY Year, Month, time
ORDER BY time  

      

Output signal

Year    Month       User_Count   
2017    January     1    
2017    February    2    
2017    March       1    

      

0


source







All Articles