my goal is to get a certain string after a predefined text. In this case i would like to read the following value:
I found out this is possible using regex, therefore i tried this:
const fs = require("fs");
const PDFParser = require("pdf2json");
// Get all the filenames from the patients folder
const files = fs.readdirSync("templates");
// All of the parse patients
let patients = [];
// Make a IIFE so we can run asynchronous code
(async () => {
// Await all of the patients to be passed
// For each file in the patients folder
await Promise.all(files.map(async (file) => {
// Set up the pdf parser
let pdfParser = new PDFParser(this, 1);
// Load the pdf document
pdfParser.loadPDF(`templates/${file}`);
// Parsed the patient
let patient = await new Promise(async (resolve, reject) => {
// On data ready
pdfParser.on("pdfParser_dataReady", (pdfData) => {
// The raw PDF data in text form
const raw = pdfParser.getRawTextContent().replace(/\r\n/g, " ");
// Return the parsed data
resolve({
gesamtbetrag: /Amount\s(:*?)--/i.exec(raw)[1].trim()
});
});
});
// Add the patient to the patients array
patients.push(patient);
}));
// Save the extracted information to a json file
fs.writeFileSync("patients.json", JSON.stringify(patients));
})();
I'm getting the error that my array is at position 1 null:
Cannot read property '1' of null
Thanks
The
exec
method returns the first match as an array, or null if there is no match.Your pattern
/Amount\s(:*?)--/i
is searching for the wordAmount
followed by a whitespace followed by zero or more:
colons followed by--
.There is no match, so
exec
returnsnull
, and the[1]
array index fails.Try
console.log(/Amount\s(:*?)--/i.exec(raw))
and make sure it returns an array, and not null. Tweak your RegEx until it matches.console.log(raw)
will show you the text you are dealing with, and tools like RegexBuddy will let you debug your RegEx.