Camelot PDF failing to strip text

632 Views Asked by André Luís At 02 June 2025 at 02:34

I have this pdf and I'm trying to work on it's very first table.

The issue happens when the name of the employer (EMPREGADOR) reaches two lines.

I'm using the following command to try to strip the data correctly:

tables = camelot.read_pdf('tipo1/t1_3.pdf', pages='1', flavor='stream', edge_tol=500, strip_text='\n')
df = tables[0].df
print(df)

But the result is the following:

                      0                             1                           2
0            EMPREGADOR              DATA DE ADMISSÃO                   PIS/PASEP
1           ABC ABC ABC                                                          
2                                          07/01/2008                   123123123
3                  LTDA                                                          
4  CARTEIRA DE TRABALHO       INSCRIÇÃO DO EMPREGADOR             NÚMERO DA CONTA
5                123123                        123123                  1231231231
6         DATA DE OPÇÃO  DATA E CÓDIGO DE AFASTAMENTO                   CATEGORIA
7            07/01/2008               30/09/2011 - N2                           1
8         TIPO DE CONTA                 TAXA DE JUROS  VALOR PARA FINS RECISÓRIOS
9               OPTANTE                      3.0% a.a                     R$ 0,00

Tried reading the docs and didn't find anything that could help me getting the employer's (EMPREGADOR) data correctly (in this case, ABC ABC ABC LTDA).

This is an issue because the lenght of the employer's name may vary to 1, 2, 3 or even more lines, making a mess in the DF and, therefore, hard to code.

Any suggestion?

Original Q&A

There are 1 best solutions below

André Luís On 13 May 2021 at 22:06 BEST ANSWER

As mentioned by Stefano Fiorucci in the comments, Camelot currently does not support the feature needed. Solution was to manipulate the data manually.

Camelot PDF failing to strip text

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in PDF

Related Questions in PYTHON-CAMELOT

Trending Questions

Popular # Hahtags

Popular Questions