Paginate CSV (Python)
How can I paginate the CSV version of the API call using Python?
I understand that the metadata in the JSON call includes the total number of records, but without this kind of information in the CSV call, I won't know where to stop my loop if I try to increase the page parameter.
Below is my code:
url = 'https://api.data.gov/ed/collegescorecard/v1/schools.csv'
payload = {
'api_key': '4KC***UNKk',
'fields': 'school.name,2012.repayment.2_yr_default_rate',
'_page' : '0'
}
r = requests.get(url, params=payload)
df = pd.read_csv(r.url)
This loads a dataframe with the first 20 results, but I would like to load a dataframe with all the results.
source to share
Use the parameter parameter &_per_page
to edit the number of options for each call; Setting it to &_per_page=200
return a CSV with 100 lines, so let's assume 100 is the maximum.
Now that we know the maximum per call and we have common calls, we can run a for loop to get what we need, for example:
url = 'https://api.data.gov/ed/collegescorecard/v1/schools.csv'
apikey = '&api_key=xxx'
fields = '&_fields=school.name,2012.repayment.2_yr_default_rate'
pageA = '&_page='
pageTotal = '&_per_page='
pageNumbersMaximum = 10
rowSum = 200
for page in range(pageNumbersMaximum):
fullURL = url + pageA + str(page) + pageTotal + str(rowSum) + fields + apikey
print(fullURL)
print("Page Number: " + str(page) + ", Total Rows: " + str(rowSum))
rowSum += 200
This will go through the results until it reaches 7000 points.
source to share