How might I filter out specific rows in this tibble that contain a colon in R?

36 Views Asked by At

I have used this code so far while scraping a website, there is previous code that gives the names of the variables but for the purposes of this question, I think a better understanding than me of ways to manipulate a tibble using filter or another tidytext package is all that is required.

I want to select only the 'lines' that contain a colon, as I believe this is the best way to differentiate the lines containing book titles on this website from those that do not. The example provided using another website used filter(lines %>% str_starts("")), however the str_starts function is not useful for this website. So how would I use the filter() function to get the lines I desire in my case? I have included a screenshot of the tibble.

episode_1544_tbl %>%
    unnest_lines(output = lines,
        input = text,
        to_lower = FALSE) %>%
    mutate(lines = lines %>%
        str_trim())
    filter(lines %>% str_detect(lines,".*:"))

Image showing tibble I want to manipulate

enter image description here

0

There are 0 best solutions below