Im planning to develop an langchain that will take user input and provide them with url related to their request.
My data format is in json (its around 35 pages)
{ page_name:{data:"",url:""}, .. }
- data is the content in that page
- url is the path
I tried using RAQ but it didn't work
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_google_genai import GoogleGenerativeAIEmbeddings
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key = GOOGLE_API_KEY)
all_splits = text_splitter.split_documents(documents)
vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")
retriever = vectordb.as_retriever()
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
verbose=True
)
qa.run('about this website')
I tried also combining data and url but I didn't work correct
data_dict={}
for name, info in data.items():
print(f'Name: {name}, URL: {info["url"]}, Data: {info["data"]}')
data_dict[name] = f'URL: {info["url"]}, Data: {info["data"]}'
would appreciate if someone can guide me to the right path to develop this model/functionality