cannot pickle '_thread.RLock' object while serializing FAISS object

1.1k Views Asked by At

I am trying to create a langchain model. I got OpenAI setup and embedded some data through URLS in a FAISS object. But I am unable to pickle the objects and getting an error saying that it contains '_thread.Rlock'. After I got to know that, it's because of the command FAISS.from_documents(). There is an issue of indexing while using this method. But I am unable to resolve this issue.

# -*- coding: utf-8 -*-
"""Langchain_LLM.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1DWToK3XFOM0v5bl7-LwT0GBfKyYVulnb
"""

!pip install python-magic langchain unstructured streamlit openai tiktoken faiss-gpu

import os
import streamlit as st
import pickle
import time
from langchain import OpenAI
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

os.environ['OPENAI_API_KEY'] = "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"

llm = OpenAI(temperature = 0.9, max_tokens=500)

loader = UnstructuredURLLoader(
    urls = [
        "https://www.moneycontrol.com/news/business/banks/hdfc-bank-re-appoints-sanmoy-chakrabarti-as-chief-risk-officer-11259771.html",
        "https://www.moneycontrol.com/news/business/markets/market-corrects-post-rbi-ups-inflation-forecast-icrr-bet-on-these-top-10-rate-sensitive-stocks-ideas-11142611.html"
    ]
)
data = loader.load()
len(data)

data[0].metadata

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,  # size of each chunk created
    chunk_overlap  = 200,  # size of  overlap between chunks in order to maintain the context
)
docs = text_splitter.split_documents(data)
len(docs)

docs[2]

# Create the embeddings of the chunks using openAIEmbeddings
embeddings = OpenAIEmbeddings()

# Pass the documents and embeddings inorder to create FAISS vector index
vectorindex_openai = FAISS.from_documents(docs, embeddings)

# Storing vector index create in local
file_path="vector_index.pkl"
with open(file_path, "wb") as f:
    pickle.dump(vectorindex_openai, f)

Error is:

TypeError                                 Traceback (most recent call last)
<ipython-input-74-15688820a1ef> in <cell line: 3>()
      2 file_path="vector_index.pkl"
      3 with open(file_path, "wb") as f:
----> 4     pickle.dump(vectorindex_openai, f)

TypeError: cannot pickle '_thread.RLock' object

I was trying to create a vector_index.pkl file

1

There are 1 best solutions below

0
On

I had the same issue, I solved it with the code below, not use pickle.

vectorindex_openai = FAISS.from_documents(docs, embeddings)

vectorindex_openai.save_local("faiss_store")

run the code and then you can get a folder named "faiss_store", there are two files in the folder, "index.faiss" and "index.pkl"。 if you want use the stored data later, you can code by

FAISS.load_local("faiss_store", OpenAIEmbeddings())