Filtering Pinecone vector database by user id

436 Views Asked by At

I'm having an issue with being able to filter the vectors in my Pinecone index by user ID. The way I initially though to set this up was that my chat function for take in the user id from the route then pre-filter only by vectors with that user ID.

@app.route('/<int:user_id>/chat', methods=['POST'])
def chat(user_id):
    user_message = request.form.get('message')
    
    # Load the conversation history from session
    conversation_history = session.get('conversation_history_{user_id}', [])

    index_name= os.getenv("PINECONE_INDEX")
    index = pinecone.Index(index_name)

    vectorstore = Pinecone(
    index, embeddings.embed_query, text_field
)
    
    bot_temperature = get_bot_temperature(user_id)
    custom_prompt = get_custom_prompt(user_id)

    # Initialize the chatbot with the bot_temperature
    llm = ChatOpenAI(
        openai_api_key=openai_api_key,
        model_name='gpt-3.5-turbo',
        temperature=bot_temperature
    )

    # Define the prompt template with placeholders for context and chat history
    prompt_template = f"""
        {custom_prompt}

        CONTEXT: {{context}}

        QUESTION: {{question}}"""
    
        # Create a PromptTemplate object with input variables for context and chat history
    TEST_PROMPT = PromptTemplate(input_variables=["context", "question"], template=prompt_template)

    # Create a ConversationBufferMemory object to store the chat history
    memory = ConversationBufferWindowMemory(memory_key="chat_history", return_messages=True, k=8)

    # Create a ConversationalRetrievalChain object with the modified prompt template and chat history memory
    conversation_chain = ConversationalRetrievalChain.from_llm(
            llm=llm,
            retriever=vectorstore.as_retriever(filter={"user_id": {"$eq": {user_id}}}),
            memory=memory,
            combine_docs_chain_kwargs={"prompt": TEST_PROMPT},
        )
    # Handle the user input and get the response
    response = conversation_chain.run({'question': user_message})
    
    # Save the user message and bot response to session
    conversation_history.append({'input': user_message, 'output': response})
    session['conversation_history'] = conversation_history
    
    # print(f"User: {user_message} | Bot:{response}")  # This will print the conversation history
    # print(conversation_history)
    # print(session)
    # print("*"*100)
    
    return jsonify(response=response)

I originally thought one could just make separate indexes for each chatbot (my app is basically a RAG pdf reader) then use the userID as the name for that index then swap index name with ID but I believe this would get very expensive.

1

There are 1 best solutions below

1
On

You can use filter with metadata, save user_id in metadata and search.

retriever_m = index.vectorstore.as_retriever(search_kwargs={
    'filter': {'source': 'user_id'},
})