I used the PDFPlumber library to extract all the lines in my PDF, a sample line extract looks like this:
Total Return Transportation $16.01
The goal is to put all of these into a data frame. How do I use regex to group this line so that I may isolate the charge type and dollar amount?
Currently, I have:
totals=re.compile(r"(\ATotal) ([\w]+) ([\w]*)")
for line in text.split("\n"):
line2=totals.search(line)
if line2:
print(line)
print(line2.group(1))
else:
pass
Group 1 returns "Total", Group 2 returns "Return" and Group 3 "Transportation" but I'm unable to make a group that retrieves the dollar amount. Any suggestions?
Note: Dollar amounts over $1000 contain a "," that might need to be included in the regex syntax
Just change your regex like so: