I have had fabulous success parsing a few hundred PDF files except for one holdout. You notice while using the following statement: table_finder = page.debug_tablefinder(table_settings={"vertical_strategy": "lines", "horizontal_strategy": "lines","edge_min_length": 50})
I get a great visual identification of the margins of the table . Yet when I extract the text the last items in the list get parsed incorrectly on a few lines. Note empty strings on line 4 and 6 then combined icons on lines 5 and 7. This is NOT due their being symbols-innumerable other tables are parsed just fine. I notice the 4th and 6th row height is relatively short-might that be the culprit and if so how might I address it?
[Procedure', 'Appropriateness Category', 'Relative Radiation Level']
[MRI cervical spine without IV contrast', 'Usually Appropriate', 'O']
[CT cervical spine without IV contrast', 'May Be Appropriate', '☢☢☢']
[Radiography cervical spine', 'May Be Appropriate (Disagreement)', '']
[MRI cervical spine without and with IV\ncontrast', 'Usually Not Appropriate', '☢☢\nO']
[Radiographic myelography cervical spine', 'Usually Not Appropriate', '']
[CT myelography cervical spine', 'Usually Not Appropriate', '☢☢☢\n☢☢☢☢']
[CT cervical spine with IV contrast', 'Usually Not Appropriate', '☢☢☢']
[CT cervical spine without and with IV\ncontrast', 'Usually Not Appropriate', '☢☢☢']
[CTA neck with IV contrast', 'Usually Not Appropriate', '☢☢☢']
[Discography cervical spine', 'Usually Not Appropriate', '☢☢']
[Facet injection/medial branch block cervical\nspine', 'Usually Not Appropriate', '☢☢']
[MRA neck with IV contrast', 'Usually Not Appropriate', 'O']
[MRA neck without IV contrast', 'Usually Not Appropriate', 'O']
[MRI cervical spine with IV contrast', 'Usually Not Appropriate', 'O']
[Bone scan whole body with SPECT or\nSPECT/CT neck', 'Usually Not Appropriate', '☢☢☢']