BigQuery GitHub data: how to handle repo name changes?

My goal is to keep track of the total stars of my repo. However, its repo.name has changed over time. How can I achieve this with a dataset githubarchive

?

+3


source to share


1 answer


(related to fooobar.com/questions/2400213 / ... )

GitHub project names go through changes, so instead of requesting by name, it is safer to request by id. You can search for the project ID in a separate request, or do it as a whole in a request like this:



SELECT 
  COUNT(*) naive_count,
  COUNT(DISTINCT actor.id) unique_by_actor_id, 
  COUNT(DISTINCT actor.login) unique_by_actor_login 
FROM `githubarchive.month.*` 
WHERE repo.id = (
  SELECT repo.id 
  FROM `githubarchive.month.201702` 
  WHERE repo.name='bazelbuild/bazel' 
  LIMIT 1)
AND type = "WatchEvent"

      

+2


source







All Articles