Why do pdf parsing libraries pdf2json and pdf-parse seem to not work with Next JS app router?

1.7k Views Asked by At

I've been trying to implement pdf parsing logic in my Next JS app. It seems the libraries pdf2json and pdf-parse don't work with the new Next JS app router.

Steps to reproduce:

  1. Run npx create-next-app@latest and follow the prompts, and say Yes to using app router.
  2. Add an API route under app/api/test
import { NextResponse } from "next/server";
import fs from "fs";
import PDFParser from "pdf2json";
import pdf from "pdf-parse";

export async function GET() {
   const pdfParser = new PDFParser();

   pdfParser.on("pdfParser_dataError", (errData: any) =>
     console.error(errData.parserError)
   );
   pdfParser.on("pdfParser_dataReady", (pdfData: any) => {
     console.log(pdfData);
   });

   pdfParser.loadPDF("./sample.pdf");
  return NextResponse.json({});
}
  1. Add a sample.pdf file in the root dir
  2. Run from terminal curl localhost:3000/api/test, pdf2json throws an uncaught error:
- error node_modules/pdf2json/lib/pdf.js (66:0) @ eval
- error Error [ReferenceError]: nodeUtil is not defined
  1. Trying pdf-parse returns a 404 not found for the API route
import { NextResponse } from "next/server";
import fs from "fs";
import PDFParser from "pdf2json";
import pdf from "pdf-parse";

export async function GET() {
   let dataBuffer = fs.readFileSync("./sample.pdf");

  pdf(dataBuffer).then(function (data) {
    // number of pages
    console.log(data.numpages);
    // number of rendered pages
    console.log(data.numrender);
    // PDF info
    console.log(data.info);
    // PDF metadata
    console.log(data.metadata);
    // PDF.js version
    // check https://mozilla.github.io/pdf.js/getting_started/
    console.log(data.version);
    // PDF text
    console.log(data.text);
  });
  return NextResponse.json({});
}

After creating a separate project with the old pages router in Next JS, none of the above issues occurred and it was able to parse the PDF properly.

Anything I am missing here?

3

There are 3 best solutions below

0
On

you need to add a folder test/data/05-versions-space.pdf

I know this is extremely random but if you look into the code you will see that it needs this file - can be any pdf - the path and name have to be the same.

Filestructure

1
On

You need to update next.config.js file.

/** @type {import('next').NextConfig} */
const nextConfig = {
  experimental: {
    serverComponentsExternalPackages: ["pdf-parse"],
  },
};

module.exports = nextConfig;

0
On

Here is how I fixed it

Step 1:
npm i patch-package
step 2:
npm i pdf-parse
step 3:

Go to this file : node_modules\pdf-parse\index.js

step 4:

Change this line

let isDebugMode = !module.parent; (for me is line 6)  

to this one

let isDebugMode = false//!module.parent;

Just convert the isDebugMode to false. (I assume you understand what this will do if you read the code)

step 5:

run this command

npx patch-package pdf-parse
step 6:

go to the package.json and add this postinstall script

"scripts": {
//your other scripts  

"postinstall": "patch-package" 
}
step 7:

delete this folder

.next\cache

step 8:

Go to your vercel deployment and redeploy without using the existing build cache (!important). this ☝☝☝☝

Note:

if it does not work (not likely)

delete the package-lock.json and run

npm install  

(am not sure if deleting package-lock.json is risky).

Let me know if it works for you