BigQuery: When is GHTorrent updated and how do I get the latest information?
2 answers
(related to fooobar.com/questions/2400213 / ... )
GHTorrent only provides a periodic snapshot of its data to BigQuery, while the GitHub Archive is updated daily (or even hourly - let me check).
It would be great to have a more frequent GHTorrent snapshot (maybe https://twitter.com/gousiosg might help), but in the meantime, you can combine both datasets (find the GHTorrent snapshot data, then add the latest stars from the GitHub archive):
#standardSQL
SELECT COUNT(DISTINCT login) c
FROM (
SELECT login
FROM (
SELECT login
FROM `ghtorrent-bq.ght_2017_01_19.watchers` a
JOIN `ghtorrent-bq.ght_2017_01_19.projects` b
ON a.repo_id=b.id
JOIN `ghtorrent-bq.ght_2017_01_19.users` c
ON a.user_id=c.id
WHERE url = 'https://api.github.com/repos/angular/angular'
)
UNION ALL (
SELECT actor.login
FROM `githubarchive.month.2017*`
WHERE repo.name='angular/angular'
AND type = "WatchEvent"
)
)
+1
source to share