How do I get the total GitHub stars for a given repo in BigQuery?

My goal is to track the popularity of my BigQuery rebroadcast over time.

I want to use public BigQuery datasets like GitHub Archive or GitHub dataset

The GitHub dataset sample_repos

does not contain snapshots:

SELECT
  watch_count
FROM
  [bigquery-public-data:github_repos.sample_repos]
WHERE
  repo_name == 'angular/angular'

      

returns 5318.

The GitHub archive is a timeline of the event. I can try to sum them up, but the numbers don't match the numbers in the GitHub frontend. Probably because it does not take into account the actions of non-stationary actions. Here is the query I used:

SELECT
  COUNT(*)
FROM
  [githubarchive:year.2011],
  [githubarchive:year.2012],
  [githubarchive:year.2013],
  [githubarchive:year.2014],
  [githubarchive:year.2015],
  [githubarchive:year.2016],
  TABLE_DATE_RANGE([githubarchive:day.], TIMESTAMP('2017-01-01'), TIMESTAMP('2017-03-30') )
WHERE
  repo.name == 'angular/angular'
  AND type = "WatchEvent"

      

returns 24144

The real value is 21,921

+2


source to share


1 answer


#standardSQL
SELECT 
  COUNT(*) naive_count,
  COUNT(DISTINCT actor.id) unique_by_actor_id, 
  COUNT(DISTINCT actor.login) unique_by_actor_login 
FROM `githubarchive.month.*` 
WHERE repo.name = 'angular/angular'
AND type = "WatchEvent"

      

enter image description here

Naive account: some stars stars and stars, and a star again. This creates duplicate WatchEvents.

Unique by the actor ID count: each can only shoot once. We can count them (but we don't know if they weren't allocated, so the total will be below that).

Unique login to login: in some historical months the actor.id field is missing. Instead, we can look at the "actor.login" field (but some people change their logins).

Alternatively, thanks to the GHTorrent project:

#standardSQL
SELECT COUNT(*) stars
FROM `ghtorrent-bq.ght_2017_01_19.watchers` a
JOIN `ghtorrent-bq.ght_2017_01_19.projects` b
ON a.repo_id=b.id
WHERE url = 'https://api.github.com/repos/angular/angular'
LIMIT 10

      



20567, dated 2017/01/19.


on this topic:

  • What happens when a project changes its name?

fooobar.com/questions/2400210 / ...

  • How do I get updated GHtorrent data before updating it?

fooobar.com/questions/2400230 / ...

+4


source







All Articles