BigQuery GitHub data: how to handle repo name changes?
1 answer
(related to fooobar.com/questions/2400213 / ... )
GitHub project names go through changes, so instead of requesting by name, it is safer to request by id. You can search for the project ID in a separate request, or do it as a whole in a request like this:
SELECT
COUNT(*) naive_count,
COUNT(DISTINCT actor.id) unique_by_actor_id,
COUNT(DISTINCT actor.login) unique_by_actor_login
FROM `githubarchive.month.*`
WHERE repo.id = (
SELECT repo.id
FROM `githubarchive.month.201702`
WHERE repo.name='bazelbuild/bazel'
LIMIT 1)
AND type = "WatchEvent"
+2
source to share