what is the code to split a sentence into a list of its constituent words AND punctuation? Most text preprocessing programs tend to remove punctuations.
For example, if I enter this:
"Punctuations to be included as its own unit."
The desired output would be:
result = ['Punctuations', 'to', 'be', 'included', 'as', 'its', 'own', 'unit', '.']
many thanks!
You might want to consider using a Natural Language Toolkit or
nltk
.Try this:
Output:
['Punctuations', 'to', 'be', 'included', 'as', 'its', 'own', 'unit', '.']