Python export to .csv without overwriting columns in for loop

184 Views Asked by At

I am trying to write data from several documents (implemented in a a for loop) to a csv file in Python 3. However, the column gets overwritten every time. How can I make that data from the individual documents be printed on a csv in the rows below, without overwriting?

from pdfminer.high_level import extract_text
for selectedfile in glob.glob(r'C:\Users\...\*.pdf'):
    text = extract_text(selectedfile)

Y = set(text)
Z = []
Znew = []
for val in Y:
    occurrences = wordlist2.count(val)
    if occurrences > 50:  # define min. no. of occurrences
        # print(val, ':', occurrences)
        Z.append(val)
        Znew.append(occurrences)

dict = {'Stem': Z, 'Count': Znew}
df = pd.DataFrame(dict)
df.to_csv('Exported list.csv', header=True, index=True, encoding='utf-8')
1

There are 1 best solutions below

0
On

The problem is in that first for loop. You keep replacing text with new extracted text and only process the final extraction. You could move the processing into the for loop to work on each extraction. In this example, I've opened the file beforehand and written the header once. Then its a question of making sure the index is correct for each write.

from pdfminer.high_level import extract_text
import pandas as pd
import numpy as np

with open('Exported list.csv', 'w', encoding='utf-8') as outfile:
    outfile.write(",Stem,Count\n") # header
    base = 0
    for selectedfile in glob.glob(r'C:\Users\...\*.pdf'):
        text = extract_text(selectedfile)

        Y = set(text)
        Z = []
        Znew = []
        for val in Y:
            occurrences = wordlist2.count(val)
            if occurrences > 50:  # define min. no. of occurrences
                # print(val, ':', occurrences)
                Z.append(val)
                Znew.append(occurrences)

        dict = {'Stem': Z, 'Count': Znew}
        df = pd.DataFrame(dict, index=np.arange(base, base+len(Z)))
        df.to_csv(outfile, index=True)
        base += len(Z)