I am trying to write a function in Python that can query all Salesforce Object data. I'm utilizing the salesforce_bulk Python Library. This is an overview of what I've written
def fetch_salesforce_object_data(bulk, object_name):
job = bulk.create_queryall_job(object_name=object_name, contentType='JSON')
batch = bulk.query(job, query)
# Wait for batch completion
while not bulk.is_batch_done(batch):
time.sleep(10)
bulk.close_job(job)
final_result_list = []
for result in bulk.get_all_results_for_query_batch(batch, job):
result = json.load(IteratorBytesIO(result))
df = pd.DataFrame(result)
final_result_list.append(df)
df = pd.concat(final_result_list)
return df
This is giving me the results I want but it is running very slow. Upon debugging, I found that the in the "Bulk Data Job Loads", Salesforce is returning the query results very quickly. But the code is taking way too long to execute at this line result = json.load(IteratorBytesIO(result)). Is there something we can replace IteratorBytesIO with? Is there any other way to fetch query result?
I have tried using multithreading to unpack the IteratorBytesIO Object that the API returns and some other solutions but haven't been able to improve the speed. Any other suggestions will be helpful, thanks!