I extracted a series of texts from an xml file (with BeautifulSoup) storing them in a list of strings(each string is a text). Now I want to modify that list of strings with list comprehension so that it becomes a list of lists where each list-item contains the lowered stemmed words of the text without punctuation.
The problem is threefold:
a) I can't remove the " " element (I tried using if word != " " but did not have any effect)
b) when I use the string library to remove punctuation things like 26-year-old turn into 26yearold. How can I avoid that while removing punctuation (with string)
c) wasn't turn into wasnt
This is the list that I am storing everything. I want to remove the " "
element and find a way to parse better the phrases with "-"
list_of_texts = [[stem(word.lower().translate (word.maketrans('', '', string.punctuation)).replace("\n", " ")) for word in text.split()] for text in list_of_texts]