I want to test a dlt source with API, but I don’t want to exceed my API limitation. How can I set up a source, so it only produces a sample of the data?
I can go to the source insides and use counter, for example:
def get_page(endpoint, headers, params):
res = requests.get(endpoint, headers, params=params).json()
count_max = 10
count = 0
while res is not None:
yield res["result"]
count += 1
if count > count_max:
return
has_more = res.get("paging", {}).get("next", None)
if has_more:
next_url = has_more["link"]
res = requests.get(next_url, headers=headers).json()
else:
res = None
But I don't want to go inside the source definition each time I need to test something.
If there is a clear pattern to the pagination of results, you can set a dlt.config for the run mode of your pipeline and generate random pages.
If there is no clear pattern observed in how the page links are constructed, you can use the VCR-py module to record responses to requests the first time you run your tests and subsequent tests will be served from the saved responses.