I want to take a tibble that represents dialogue and turn it into a .txt that can be manually edited in a text editor and then returned to a tibble for processing.
The key challenge I've had is separating the blocks of text in a way that they can be re-imported to a similar format after editing while preserving the "Speaker" designation.
Speed is important as the volume of files and the length of each text segment are large.
Here's the input tibble:
tibble::tribble(
~word, ~speakerTag,
"been", 1L,
"going", 1L,
"on", 1L,
"and", 1L,
"what", 1L,
"your", 1L,
"goals", 1L,
"are.", 1L,
"Yeah,", 2L,
"so", 2L,
"so", 2L,
"John", 2L,
"has", 2L,
"15", 2L
)
Here's the desired output in a .txt:
###Speaker 1###
been going on and what your goals are.
###Speaker 2###
Yeah, so so John has 15
Here's the desired return after correcting errors manually:
~word, ~speakerTag,
"been", 1L,
"going", 1L,
"on", 1L,
"and", 1L,
"what", 1L,
"your", 1L,
"goals", 1L,
"in", 1L,
"r", 1L,
"Yeah,", 2L,
"so", 2L,
"so", 2L,
"John", 2L,
"hates", 2L,
"50", 2L
)
One way would be to add Speaker name
"\n"
at the start of eachspeakerTag
We can write this in text file using
It looks like this :
To read it back in R, we can do :
Obviously we can remove
"Speaker"
part inSpeakerTag
column if it is not needed.