This is what I have so far:
import re
import csv
outfile1 = open('test_output.csv', 'wt')
outfileWriter1 = csv.writer(outfile1, delimiter=',')
rawtext = open('rawtext.txt', 'r').read()
print(rawtext)
rawtext = rawtext.lower()
print(rawtext)
re.sub('[^A-Za-z0-9]+', '', rawtext)
print(rawtext)
First of all, when I run this the punctuation doesn't get removed so I'm wondering if there's something wrong with my expression?
Secondly, I'm trying to produce a .csv list of all words flagged with whether they had punctuation or not, e.g. a text file reading "Hello! It's a nice day." would output:
ID, PUNCTUATION, WORD
1, Y, hello
2, Y, its
3, N, a
4, N, nice
5, Y, day
I know I can use .split() to split up the words but other than that I have no idea how to go about this! Any help would be appreciated.
You can do something like this: