How can I use pdftools in R to convert a large batch of PDF files to TXT files?

745 Views Asked by At

I'm trying to extract ~600 pdf files filled with tables to text format so I can do some data exploration. It looks like pdftool is my best bet to get the job done but the help files are brief. The closest tutorial I found uses xpdf. Is there a way to do this using pdftools?

library("pdftools")
folder <- file.path("C:\\Users\\adarvishian\\Documents\\MEGA\\Consular 
Affairs\\Visa Statistics\\Scrape")
folder
length <- length(dir(folder))
length
dirpdf <- dir(folder)
dirpdf[1]


for(i in 1:length(dir(folder)))
{
   text <- pdf_text("C:\\Users\\adarvishian\\Documents\\MEGA\\Consular 
 Affairs\\Visa Statistics\\Scrape")
}

xpdf batch tutorial

1

There are 1 best solutions below

0
On
library("pdftools")

folder <- file.path("C:\\Users\\adarvishian\\Documents\\MEGA\\Consular 
Affairs", "Visa Statistics", "Scrape")
folder
length <- length(dir(folder))
length
dirpdf <- dir(folder)
dirpdf[1]

pdftotxt <- "C:\\Users\\adarvishian\\Documents\\R\\otherpackages\\xpdf-
tools-win-4.00\\xpdf-tools-win-4.00\\bin64\\pdftotext.exe"

for(i in 1:length(dir(folder)))
{
pdf <- file.path("C:\\Users\\adarvishian\\Documents\\MEGA\\Consular 
Affairs\\Visa Statistics", "Scrape", dirpdf[i])
system(paste("\"", pdftotxt, "\" \"", pdf, "\"", sep = ""),wait = F)
}