Thanks to other articles on this website, I managed to put together a script that will do the following:

  1. Collect PDF file names from directory and put into a list.
  2. Start a data frame using target data from the first PDF in the directory.
  3. Use loop function to add rows to the original data frame containing the same target data (pulling from the same section of the PDF).

My first two steps work (code below)

file_names <- list.files(pattern = "*.pdf")
df <-
  extract_tables(
    file = "firstlastname.pdf",
    method = "decide",
    output = "data.frame"
  ) %>%
  pluck(2) %>%
  t() %>%
  as.data.frame() %>%
  slice(2) %>%
  select(1:3) %>%
  rename("inst" = "V1",
         "date" = "V2",
         "field" = "V3")

but my final step throws the following error: "Error in pluck(., 2) : object 'tmp' not found"

for (i in file_names)
{
  new <-
    extract_tables(
      file = i,
      method = "decide",
      output = "data.frame"
      ) %>%
    pluck(2) %>%
    t() %>%
    as.data.frame() %>%
    slice(2) %>%
    select(1:3) %>%
    rename("inst" = "V1",
           "date" = "V2",
           "field" = "V3") %>%
    df[nrow(df) + 1, ] <- new
}

I am confused because I actually made it all the way through successfully a couple times, but I tried it again after closing RStudio and coming back, and it just won't work anymore. I'm a complete beginner just trying to automate my secretary job a little bit, but I'm probably in way over my head. All I can do is Google things, copy and paste code, and try to understand what everything means and how it comes together.

Unfortunately I can't provide my data files because they contain people's personal information, but the final result is supposed to look like a table with about 50 rows and 3 columns. I did take a photo the first time it worked, though:

successful data frame

Thank you for reading. Any tips would be much appreciated!

0

There are 0 best solutions below