I have a PDF document that I am trying to extract text from. This is what I used:
with pdfplumber.open('myfile.pdf') as pdf:
my_page=pdf.pages[3] ##It is the 3rd page I am working with
text=my_page.extract_text()
print(text)
Here is what the PDF looks like:
id description cost
1 toy_car $10.00
2 big_huge_description $20.00
_for_a_car
3 toy_kitchen $30.00
What happens is the PDF has some of the characters spilling over into the first column when I try to extract the data:
example:
id description cost
1 toy_car $10.00
2 big_huge_description $20.00
_for_a_car
3 toy_kitchen $30.00
how can I output the text so that it looks like this?
id description cost
1 toy_car $10.00
2 big_huge_description_for_a_car $20.00
3 toy_kitchen $30.00
Any suggestions?