How to pass multer file data into LangChain.js WebPDFLoader correctly?

30 Views Asked by At

I'm using multer in nodejs to handle file uploads. When a PDF file is uploaded I want to split it into chunks and store those chunks into a vector store (using langchain.js) for a RAG application.

import { WebPDFLoader } from 'langchain/document_loaders/web/pdf';
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';

// file is provided by multer
const data = file.buffer
const mimetype = file.mimetype

const blob = new Blob([data]);
const loader = new WebPDFLoader(blob, {
    splitPages: false,
});

const docs = await loader.load();

const textSplitter = new RecursiveCharacterTextSplitter({
    chunkSize: 1000,
    chunkOverlap: 200,
});

When fetching a PDF from a URL instead of from the multer buffer, this method works as expected:

const url = "https://dagrs.berkeley.edu/sites/default/files/2020-01/sample.pdf"

const response = await fetch(url);
const data = await response.blob();
console.log(data)
const loader = new WebPDFLoader(data, {
    splitPages: false,
});

When I console.log(data) in the above code, I get: Blob { size: 54836, type: 'application/pdf' }

When creating the blob from multer do I need to include more data in the blob than just the buffer from multer.file? How would I do that?

1

There are 1 best solutions below

0
ADITYA On BEST ANSWER

You can modify your code to create the Blob object with the correct MIME type:

const { Blob } = require('buffer');
// Assuming 'file' is provided by multer
const data = file.buffer;
const mimetype = file.mimetype;

// Create Blob with correct MIME type
const blob = new Blob([data], { type: mimetype });

// Now you can use 'blob' with langchain.js
const loader = new WebPDFLoader(blob, {
    splitPages: false,
});