I was using apply_parallel function from pandarallel library, the below snippet(Function call) iterates over rows and fetches data from mongo db. While executing the same throws me EOFError and a mongo client warning as given below
Mongo function:
def fetch_final_price(model_name, time, col_name):
collection = database['col_name']
price = collection.find({"$and":[{"Model":model_name},{'time':time}]})
price = price[0]['price']
return price
Function call:
final_df['Price'] = df1.parallel_apply(lambda x :fetch_final_price(x['model_name'],x['purchase_date'],collection_name), axis=1)
MongoClient config:
client = pymongo.MongoClient(host=host,username=username,port=port,password=password,tlsCAFile=sslCAFile,retryWrites=False)
Error:
EOFError: Ran out of input
Mongo client warning:
"MongoClient opened before fork. Create MongoClient only "
How to make db calls in parallel_apply??
First of all, "MongoClient opened before fork" warning also provides a link for the documentation, from which you can know that in multiprocessing (which pandarallel base on) you should create MongoClient inside your function (
fetch_final_price), otherwise it likely leads to a deadlock:The second mistake, that leads to the exception in the function and the following EOFError, is that you use the brackets operator to a
findresult, which is actually an iterator, not a list. Consider usingfind_oneif you need only a first instance (alternatively, you can donext(price)instead of indexing operator, but it's not a good way to do this)