I have a python code here which goes into SAP using BAPI RFC_READ_TABLE, queries USR02 table and bring back the results. The input is taken from an excel sheet A column and the output is pasted in B column The code is running all fine. However, for 1000 records, it is taking 8 minutes approximately to run. Can you please help in optimizing the code? I am really new at python, managed to write this heavy code but now stuck at the optimization part.
It would be really great if this can run in 1-2 minutes max.
from pyrfc import Connection, ABAPApplicationError, ABAPRuntimeError, LogonError, CommunicationError
from configparser import ConfigParser
from pprint import PrettyPrinter
import openpyxl
ASHOST='***'
CLIENT='***'
SYSNR='***'
USER='***'
PASSWD='***'
conn = Connection(ashost=ASHOST, sysnr=SYSNR, client=CLIENT, user=USER, passwd=PASSWD)
try:
wb = openpyxl.load_workbook('new2.xlsx')
ws = wb['Sheet1']
for i in range(1,len(ws['A'])+1):
x = ws['A'+ str(i)].value
options = [{ 'TEXT': "BNAME = '" +x+"'"}]
fields = [{'FIELDNAME': 'CLASS'},{'FIELDNAME':'USTYP'}]
pp = PrettyPrinter(indent=4)
ROWS_AT_A_TIME = 10
rowskips = 0
while True:
result = conn.call('RFC_READ_TABLE', \
QUERY_TABLE = 'USR02', \
OPTIONS = options, \
FIELDS = fields, \
ROWSKIPS = rowskips, ROWCOUNT = ROWS_AT_A_TIME)
rowskips += ROWS_AT_A_TIME
if len(result['DATA']) < ROWS_AT_A_TIME:
break
data_result = result['DATA']
length_result = len(data_result)
for line in range(0,length_result):
a= data_result[line]["WA"].strip()
wb = openpyxl.load_workbook('new2.xlsx')
ws = wb['Sheet1']
ws['B'+str(i)].value = a
wb.save('new2.xlsx')
except CommunicationError:
print("Could not connect to server.")
raise
except LogonError:
print("Could not log in. Wrong credentials?")
raise
except (ABAPApplicationError, ABAPRuntimeError):
print("An error occurred.")
raise
EDIT : So here is my updated code. For now, I have decided to output the data on command line only. Output shows where is the time taken.
try:
output_list = []
wb = openpyxl.load_workbook('new3.xlsx')
ws = wb['Sheet1']
col = ws['A']
col_lis = [col[x].value for x in range(len(col))]
length = len(col_lis)
for i in range(length):
print("--- %s seconds Start of the loop ---" % (time.time() - start_time))
x = col_lis[i]
options = [{ 'TEXT': "BNAME = '" + x +"'"}]
fields = [{'FIELDNAME': 'CLASS'},{'FIELDNAME':'USTYP'}]
ROWS_AT_A_TIME = 10
rowskips = 0
while True:
result = conn.call('RFC_READ_TABLE', QUERY_TABLE = 'USR02', OPTIONS = options, FIELDS = fields, ROWSKIPS = rowskips, ROWCOUNT = ROWS_AT_A_TIME)
rowskips += ROWS_AT_A_TIME
if len(result['DATA']) < ROWS_AT_A_TIME:
break
print("--- %s seconds in SAP ---" % (time.time() - start_time))
data_result = result['DATA']
length_result = len(data_result)
for line in range(0,length_result):
a= data_result[line]["WA"]
output_list.append(a)
print(output_list)

In my opinion, the problem is in the while True loop. I think you need to optimize your query logic (or change it). It is hard without knowing what you are interested in the DB, The other things looking easy and fast.
Something that could help is to try to not open and close the file continuously: try to compute your "B" column and then open and paste all at once in the xlsx file. It could help (but i'm pretty sure that is the query the problem)
P.S. Maybe you can use some timing library (like here) to compute WHERE you spend most of the time.