While employing PDFMinerLoader to parse PDF files, I've observed that it introduces additional new lines when encountering bullets or numbers. For example:
Original pdf:
- use the...
- replace the..
- update the..
I got:
1.
2.
3.
use the..
replace the..
update the..
Similar issues occur with bullet points, such as: ●
How can I address this problem?
I attempted to switch to an alternative parser method, but it yielded unsatisfactory results, specifically causing text concatenation issues.