I need to take a table from a PDF file.
the code is:
pdf=tabula.read_pdf(arquivo, pages=(1,2), lattice=True)
I convert both df to lists, as code below:
lista=pdf[1].values.tolist()
lista2=pdf[2].values.tolist()
My problem is that the convertion is losing the first row of dataframe.
The result of convertion of lista2 is:
"[[**8**,
'vitamínicos e/ou minerais /\rVitaminas: C (45mg), E (10mg),\rNiacina (16mg), A (600mcg), ac.\rpantotênico (5mg), D (5mcg), B6\r(1,3mg), B1 (1,2mg), B2 (1,3 mg),\rB12 (1mcg), ác. fólico (200mcg),\rbiotina (30mcg): Minerais: cálcio\r(90mg), fósforo (38mg),\rmanganês (45mg), ferro (5mg),\rzinco (5mg), selênio (30 mcg),\rmanganês (1,2mg), selênio\r(30mcg), iodo (100mcg):\rProbiótico: Lactobacillus\racidophilus / COMPRIMIDO /\rSEM MARCA',
4705050,
'CP',
360,
nan],
[**9**,
'vitaminas + minerais /\rpolivitaminas + poliminerais /\rCOMPRIMIDO REVESTIDO\r/ ZIRVIT MULTI - POR MARCA',
3970019,
'CP',
540,
nan],
[**10**,
'suplemento alimentar / óleo de\rmicroalgas e lecitina de soja /\rCÁPSULA / SEM MARCA',
5717310,
'CP',
360,
nan]]"
When I request the valor of original source (before values.tolist) pandas data frame pdf[2], I have:
**8** vitamínicos e/ou minerais /\rVitaminas: C (45m... 4705050 CP 360 NaN
**9** vitaminas + minerais /\rpolivitaminas + polimi... 3970019 CP 540 NaN
**10** suplemento alimentar / óleo de\rmicroalgas e l... 5717310 CP 360 NaN"
I have 4 products in pd df (7,8,9,10) and when I convert this to the list, I lost the first value, product ID 7.
Any idea how to solve this question? Thank you.