I have a program getting data from excel files. I'm using Python 3.11.4 and xlWings. The data consists of some standard data and some data defined by users input. The excel contains columns with technical data belonging to product id's located at the top of the column. The data is saved on a database that is created before the extraction starts. Information about the data that needs to be extracted are stored in a dictionary, with the headers of the database functioning as keys. For the standard data the value of the dictionary is an array with the found value from the excel, and a string used as a datatype for the creation of the columns in the database. For user defined data the array has two additional elements: the search string (what to look for in the excel) and the address where to find this string. The value in the dictionary that will be saved on the database is updated for every column that is looped through. The code loops through the documents given by the user.
The problem: when extracting data from many files at once, the code returns an error at a particular excel file:
pywintypes.com_error: (-2146827864, 'OLE error 0x800a01a8', None, None) at line 1395 (the last line of the following block):
#Retrieve properties
for header in headers:
if header_basis[header][3] is None:
header_basis[header][0] = "not found!"
else:
header_basis[header][0] = header_basis[header][3].offset(0,i).value
headers is an array with strings defining the name of the user defined data. These strings are the column headers of the database and the keys of the dictionary.
The code itself is quite long and has a lot of content that have little to do with this problem (I assume) But the important parts are here (seperated by (...)):
def search_productcard_excel(path, families, headers, search_strings, master):
(...)
#Create dictionary with value and database data type.
header_basis = {
"product_ID": [None, "BIGINT(20)"],
"doc_name": [None, "VARCHAR(50)"],
"family": [None, "VARCHAR(50)"],
"num_of_conductors": [None, "VARCHAR(50)"],
"type": [None, "VARCHAR(50)"],
"CSA": [None, "VARCHAR(50)"],
#"current": [None, "FLOAT(5)"],
"note_1": [None, "VARCHAR(50)"],
"note_2": [None, "VARCHAR(50)"],
}
for i, header in enumerate(headers):
#Add the search string and address for manual data too
header_basis[header] = [None, "VARCHAR(50)", search_strings[i], None]
(...)
#Create new table
sql_string = f"CREATE TABLE {temp_table} ("
for i, key in enumerate(header_basis.keys()):
if i > 0:
sql_string += ", "
sql_string += f"{key} {header_basis[key][1]}"
sql_string = sql_string + ")"
cursor.execute(sql_string)
connection.commit()
(...)
#Do for all document names
create_log_text("Start searching documents...\n")
for doc_name in doc_name_arr:
xml_files = []
for extension in ALLOWED_FILES:
xml_files.extend(glob.glob(os.path.join(path, f"{doc_name}.{extension}")))
#do for all found documents
for xml_file in xml_files:
(...)
#Find the start position of the given properties
for i, header in enumerate(headers):
search_string = header_basis[header][2]
#Search for both columns if "&&" is found
if search_string.find("&&") != -1:
char_pos = search_string.find("&&")
search_string_1 = search_string[0:char_pos-1]
search_string_2 = search_string[char_pos+3:]
temp_range = fun_search_string(search_string_1,
worksheet[PROPERTY_RANGE_BACKUP],
0)
if temp_range is not None:
for search_cell in temp_range:
if search_cell is None:
header_basis[header][3] = None
else:
temp_result = fun_search_string(
search_string_2,
worksheet[search_cell.offset(0,2).address],
0
)[0]
if temp_result is not None:
header_basis[header][3] = search_cell.offset(0,2)
break
else:
header_basis[header][3] = (
fun_search_string(
search_string,
worksheet[PROPERTY_RANGE],0
)[0]
)
#Get family name
header_basis["family"][0] = worksheet["B2"].value
#for every column (CSA)
create_log_text("Search given values...\n")
for i in range(1, MAX_SIZES):
(...)
#Retrieve properties
for header in headers:
if header_basis[header][3] is None:
header_basis[header][0] = "not found!"
else:
header_basis[header][0] = header_basis[header][3].offset(0,i).value
(...)
#Do for every product
for cell in id_cells:
(...)
#Add row table
#Create sql command
sql_string = f"INSERT INTO {temp_table} ("
for j, key in enumerate(header_basis.keys()):
if j > 0:
sql_string += ", "
sql_string += key
sql_string = sql_string + ") VALUES ("
for j, key in enumerate(header_basis.keys()):
val = header_basis[key][0]
if j > 0:
sql_string += ", "
try:
float(val)
sql_string = sql_string + str(val)
except ValueError:
sql_string = sql_string + f"'{str(val)}'"
except TypeError:
sql_string = sql_string + "''"
sql_string = sql_string + ")"
#print(sql_string)
cursor.execute(sql_string)
(...)
#close workbook
create_log_text("Closing file...\n")
workbook.close()
(...)
#close excel app
app.quit()
When extracting the data from only the one particular file where the error returns, instead of doing many documents at once, it doesn't return the error.
Additionaly, before simplifying the code a little, I stored the values in an array separate from the other data (address, datatype and search strings):
val_arr[i] = fun_search_string(search_string, worksheet[PROPERTY_RANGE],0)[0]
instead of
header_basis[header][3] = (fun_search_string(search_string, worksheet[PROPERTY_RANGE],0)[0])
This did not return any errors.
I got no idea what exactly the pywintypes.com_error is or what the 'OLE error 0x800a01a8' in it means and I have a hard time figuring t where to start because of it. I haven't found an answer on the internet either.
EDIT: I found a solution myself. I suspected it had something to do with memory. So I just set header_basis[header][3] is None before updating header_basis[header][3] every loop. This seemed to work. I
I suspected the error had something to do with a memory issue. Perhaps the dictionary unknowingly filled with data as the values weren't deleted but updated. I reset the dictionary value to None before updating it in the loop. This seemed to work. I'm not sure why exactly, because I would've thought that updating a dictionary value (e.g. header_basis[header][3] = blabla) would erase the previous value and would leave no leftover data. Perhaps it also has to do with the fact that a range is stored there.
The following code:
Became: