I am pulling data with a SODA API using the following code in Python
response = requests.get('https://healthdata.gov/resource/uqq2-txqb.json')
The dataset contains 434,865 rows but when I use the API, it only returns the first 1,000 rows. I saw on another question that $limit
can be used to get the first 50,000 rows but how would I combine this with $offset
to get all 434,865 rows?
** I figured out how to use $offset
and now have the resulting code, is there any way to condense this?
response1 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000')
response2 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=50001')
response3 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=100002')
response4 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=150003')
response5 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=200004')
response6 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=250005')
response7 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=300006')
response8 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=350007')
response9 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=400008')
This is referred to as
paging
, and you can find documentation for example here: https://dev.socrata.com/docs/paging.htmlIn there, it is also specified that there are two versions of API:
$limit
can be max 50,000$limit
is unlimitedThe endpoint you are using seems to be support v2.1, at least based on this https://dev.socrata.com/foundry/healthdata.gov/uqq2-txqb so you should be able to use a large value for
$limit
and retrieve the entire set at once.Going the paging route, the
$offset
value is0-based
, hence your queries should be properly rewritten as:Note the alignment on multiples of
$limit
.