I have used this code so far while scraping a website, there is previous code that gives the names of the variables but for the purposes of this question, I think a better understanding than me of ways to manipulate a tibble using filter or another tidytext package is all that is required.
I want to select only the 'lines' that contain a colon, as I believe this is the best way to differentiate the lines containing book titles on this website from those that do not. The example provided using another website used filter(lines %>% str_starts("")), however the str_starts function is not useful for this website. So how would I use the filter() function to get the lines I desire in my case? I have included a screenshot of the tibble.
episode_1544_tbl %>%
unnest_lines(output = lines,
input = text,
to_lower = FALSE) %>%
mutate(lines = lines %>%
str_trim())
filter(lines %>% str_detect(lines,".*:"))
Image showing tibble I want to manipulate