Paginate CSV (Python)

Question

Paginate CSV (Python)

How can I paginate the CSV version of the API call using Python?

I understand that the metadata in the JSON call includes the total number of records, but without this kind of information in the CSV call, I won't know where to stop my loop if I try to increase the page parameter.

Below is my code:

url = 'https://api.data.gov/ed/collegescorecard/v1/schools.csv'

payload = {
    'api_key': '4KC***UNKk',
    'fields': 'school.name,2012.repayment.2_yr_default_rate',
    '_page' : '0'
}

r = requests.get(url, params=payload)
df = pd.read_csv(r.url)

This loads a dataframe with the first 20 results, but I would like to load a dataframe with all the results.

+3

python

Tim Ernst Apr 20 17 at 20:12

source to share

1 answer

albert · Answer 1 · 2017-04-21T00:35:28+0000

Use the parameter parameter &_per_page

to edit the number of options for each call; Setting it to &_per_page=200

return a CSV with 100 lines, so let's assume 100 is the maximum.

Now that we know the maximum per call and we have common calls, we can run a for loop to get what we need, for example:

url = 'https://api.data.gov/ed/collegescorecard/v1/schools.csv'
apikey = '&api_key=xxx'
fields = '&_fields=school.name,2012.repayment.2_yr_default_rate'
pageA = '&_page='
pageTotal = '&_per_page='
pageNumbersMaximum = 10
rowSum = 200
for page in range(pageNumbersMaximum):
    fullURL = url + pageA + str(page) + pageTotal + str(rowSum) + fields + apikey
    print(fullURL)
    print("Page Number: " + str(page) + ", Total Rows: " + str(rowSum))
    rowSum += 200

This will go through the results until it reaches 7000 points.

Paginate CSV (Python)

More articles: