I have a question redarding text mining with the corpus package and the function text_tokens(). I want to use the function for stemming and deleting stop words. I have a huge amount of data (almost 1.000.000 comments) where I want to use it for. But I've problems with the output, the function text_tokens produces. So here is a basic example of my data and code:
library(tidyverse)
library(corpus)
library(stopwords)
text <- data.frame(comment_id = 1:2,
comment_content = c("Hallo mein Name ist aaron","Vielen Lieben Dank für das Video"))
tmp <- text_tokens(text$comment_content,
text_filter(stemmer = "de",drop = stopwords("german")))
My problem now is, that I want a data.frame as output with the comment_id in the first column and word_token in the column. So the output I would like to have looks as followed:
df <- data.frame(comment_id = c(1,1,1,2,2,2),
comment_tokens = c("hallo","nam","aaron","lieb","dank","video"))
I tried different do.calls (cbind/rbind), but they don't give me the result I need. So what is the function I'm looking for, is it map() from the tidyverse?
Thank you in advance.
Cheers,
Aaron

Here's an option using
imap_dfrfrompurrr:Or if you prefer using an anonymous function: