Multiple .docx text files converted into a .csv table: How can I do this in R?

17 Views Asked by At

I am relatively new to R and I am looking for an automated way of saving the text of hundreds of .docx files into one .csv file which I can then use for computational text analysis. The docx files are all similarly structures but have different file names. Each docx file should be one row in the table. I would like them to be ordered into a table with the following columns: date, URL, title, text. Moreover, I would like to add a column that includes an ID for each row. Can anyone help me?

So far, I tried to do this with the readtext() function. Which worked for up to the point where I tried to put the different parts into one dataframe. Also, I do not know yet how to create a loop for multiple files that are named differently.

library(readtext)

    #read text
    doc.text <- readtext(".../mytext.docx")$text
# x$text will contain the plain text in the file**

# Split text into parts using new line character:

doc.parts <- strsplit(doc.text, "\n")[[1]]
doc.parts

#first line in the document: title
title <- doc.parts[1]
title

#extract the date
date <- doc.parts[4]
#put in dataframe

x2 <- c(x, date, title)
0

There are 0 best solutions below