Performing similarity search on embedded PDF

416 Views Asked by At

I have embedded a PDF using the OPENAI embeddings and have saved it in a local file. Basically I am trying to get a text input, for example: "cat" and perform similarity search. I have tried using the following implementation:

    const embedder = new OpenAIEmbeddings(); 
    const inputEmbedding = await embedder.embedQuery(input_prompt);
    const jsonString = fs.readFileSync('./embedded.json', 'utf-8');
    const book_embeddings = JSON.parse(jsonString);
    const ind = new HNSWLib(275);
    for (let i = 0; i < embeddings.length; i++) {
        ind.add(embeddings[i], i);
    const k = 2;
    const result = ind.similaritySearch(inputEmbedding, k);

275 is the length of the embedded list representing the PDF. When I run this I receive the following error: "Cannot read properties of undefined (reading 'index')". Which I don't understand. The same error occurs if I just instantiate the HNSWLib object on its own. This suggests to me that something may be wrong with the way I imported the library. to import it I used: import { HNSWLib } from "langchain/vectorstores/hnswlib"; . I managed to get it working creating a vector store first starting from the raw PDF text adequately split into paragraphs, something like const vectorStore = await HNSWLib.fromTexts(), followed by the text you want to embed and the embedding. However this isn't what I'm looking for because I already have the embedding of the PDF document. If there was a function like HNSWLib.fromEmbeddings() that would work, but unfortunately that doesn't exist. Any suggestions? Thanks



There are 0 best solutions below