Improve query performance for SAP LMS API call that requires individual ID per call

37 Views Asked by At

I need to iterate through the Users endpoint of the SAP API to return a custom column per user - the issue is that I have over 25,000 users and the API call must pass each user ID as a parameter.

Currently my code is taking about 8 minutes per 500 records - I'm struggling to find an approach that would be much more efficient as I can't wait 6 hours for this process to execute.

I have tried using Spark, asyncio and aiohttp but they don't seem to be offering much of a query improvement. Any guidance would be greatly appreciated

I understand that iterating through each user id is not efficient, but I can't seem to find another option as the data must come from this specific endpoint

In code below user_df is a dataframe that contains all of the user data, I am iterating through the user ids in this dataframe and passing the respective id as a parameter in the endpoint call.

import pandas as pd
 

def find_cust_col(user_id):
  cust_col_endpoint = f'https://test.com/Users(\'{user_id}\')'
  response = requests.get(cust_col_endpoint, headers=headers)
 
  if response.status_code == 200:
    data = response.json()
    custom_data = next((item['value'] for item in data['customColumn'] if item['columnNumber'] == 10), None)
    return user_id, custom_data
 
  print(f"No data found for user {user_id}, status code {response.status_code}")
  return None
 
results = []
 
with ThreadPoolExecutor() as executor:
  results = list(executor.map(find_cust_col, user_df['user_id']))
 
results = [result for result in results if result is not None]
 
result_df = pd.DataFrame(results, columns=['user_id', 'cust_col'])```

0

There are 0 best solutions below