Indexing custom data on Pinecone

145 Views Asked by Krishna Gupta At 18 June 2023 at 17:31

So I have a company's data (The data is basically their website dump) and I want this data to be indexed so that I can build a semantic search engine. The data structure is somewhat like this [{'title': 'some title','content':'web page's content','url': 'the page's url'},{}....and so on] where each dictionary {} represents a page. The problem is with the size of content. If the content of a page is too large I have to split this content into chunks and then vectorize it and finally indexing on pinecone. For each chunk the title and the url is same if they belong to the same page. When I query the database I often get the results that has same url and the title because of the chunking. How can I avoid this? Also what if I don't make chucks, rather vectorize the entire content even if it is big and then index on pinecone. In this case will the search results will be effective? Is the any other efficient way of index these data so as to build a powerful, effective search engine

Original Q&A

Indexing custom data on Pinecone

There are 0 best solutions below

Related Questions in VECTOR

Related Questions in INDEXING

Related Questions in EMBEDDING

Related Questions in SEMANTIC-SEARCH

Related Questions in PINECONE

Trending Questions

Popular # Hahtags

Popular Questions