Using Langchain and Chroma DB to make a chatbot

1k Views Asked by At

I'm making a web application for my school project, where you can create chatbots and train them on data such as pdf and docx.

I have a chroma db on my docker and I have this API endpoint that I use in my application when I upload files.

@document_blueprint.route('/file_upload', methods=['POST', 'OPTIONS'])
def document_token_count():
    
    client = chromadb.HttpClient(host='localhost', port=8000)
    embedding_function = OpenAIEmbeddings(openai_api_key="HIDDEN FOR STACKOVERFLOW")
    collection = client.get_or_create_collection("president")
    
    if request.method == 'OPTIONS':  
        response = make_response()  
        response.headers.add('Access-Control-Allow-Origin', 'https://localhost:7197')
        response.headers.add('Access-Control-Allow-Headers', 'Content-Type')
        response.headers.add('Access-Control-Allow-Methods', 'POST')
        return response
  
    file = request.files.get('document')
    if not file:
        return jsonify({'error': 'No file provided'}), 400

   
    temp_file_path = os.path.join(tempfile.gettempdir(), secure_filename(file.filename))
    file.save(temp_file_path)  
    print(temp_file_path)    

    loader = PyPDFLoader(temp_file_path)
    documents = loader.load()
    
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    docs = text_splitter.split_documents(documents)
    
    for doc in docs:
        collection.add(ids=[str(uuid.uuid1())], metadatas=doc.metadata, documents=doc.page_content)
    
    db = Chroma(client=client, collection_name="president", embedding_function=embedding_function)    

       
    
    tokens = count_tokens_in_document(temp_file_path)

    
    os.remove(temp_file_path)

    return jsonify({'tokens': tokens, 'message': 'Embedding added to Chroma DB successfully.'})

It inserts the embeddings into my chroma db under the collection "president" - the idea is that you can upload multiple files for one chatbot into one collection. How do I use Langchain to answer questions from that specific collection? I looked at Langchain's website but there aren't really any good examples on how to do it with a chroma db if you use docker. As I said it is a school project, but the idea is that it should work a bit like Botsonic or Chatbase where you can ask questions to a specific chatbot which has its own knowledge base.

I found this example from Langchain:

import chromadb
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import ChatOpenAI
from langchain.schema.runnable import RunnablePassthrough
from langchain.prompts import PromptTemplate
from langchain.retrievers import SVMRetriever

client = chromadb.HttpClient(host="localhost", port=8000)


loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings(openai_api_key="Hidden"))

question = "What are the approaches to Task Decomposition?"
svm_retriever = SVMRetriever.from_documents(all_splits, OpenAIEmbeddings(openai_api_key="Hidden"))
docs_svm = svm_retriever.get_relevant_documents(question)
len(docs_svm)

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0,openai_api_key="Hidden")

template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 
Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
rag_prompt_custom = PromptTemplate.from_template(template)

rag_chain = (
    {"context": svm_retriever, "question": RunnablePassthrough()} | rag_prompt_custom | llm
)

result = rag_chain.invoke("What is Task Decomposition?")
print(result)

I would assume there is a way to search in an existing chroma db and collection.

0

There are 0 best solutions below