Python script that takes data from multiple .ODS files and joins it all in 1 file

55 Views Asked by At

Please note that i'm not very familiar with Python so i'm sorry if this is a bit confusing.

I have a folder with many .ODS files which all contain the same 3 sheets. I need to take data from two of these sheets: -in the 1° sheet called 'prenotazioni' i need to take everything in the range U7:V27 -in the 2° sheet called 'chiusura' i need to take everything in the range J2:J22

What the script has to do is take these 2 ranges in every .ODS file that is in the specified folder and put it together in one big table (1st range then to it's right 2nd range and then go to new line for the second .ods file and so on). My code works 80%, it goes through all the files in the folder but for some reason the first 5 rows of the range J2:J22 are pushed to the bottom in a very weird manner and i can't seem to quite understand why because obviously the original data isn't written as my script outputs it. Also when i'm specifying the ranges in the script in theory i would have to put in prenotazione slice(7, 28), slice(20, 22) but it cuts out some values so i have to use slice(5, 26), slice(20, 22) to get the data i want even tho it would be wrong, same goes for the other range i have to put an offset to get the right data and not the "correct" values. The strange part is that i have to do it only for rows because the columns range is not affected by this issue. I've tried to understand for hours and i'm not even close to understanding what might be the problem.

This is my Python code

import os
import pandas as pd


def estrai_range(file_path, foglio, righe, colonne):
    df = pd.read_excel(file_path, sheet_name=foglio)
    return df.iloc[righe, colonne]


cartella_input = 'C:/Users/Sihan/Desktop/FINALE'

df_completo = pd.DataFrame()

for file_name in os.listdir(cartella_input):
    if file_name.endswith(".ods"):
        file_path = os.path.join(cartella_input, file_name)

        colonne_prenotazioni = estrai_range(file_path, 'prenotazione', slice(5, 26), slice(20, 22))

        colonne_chiusura = estrai_range(file_path, 'chiusura', slice(0, 23), slice(9, 10))

        df_temp = pd.concat([colonne_prenotazioni, colonne_chiusura], axis=1)

        df_temp['File'] = file_name

        df_completo = pd.concat([df_completo, df_temp], ignore_index=False)

cartella_output = 'C:/Users/Sihan/Desktop/FINALE'
nuovo_file_output = os.path.join(cartella_output, 'RESOCONTO.ods')
df_completo.to_excel(nuovo_file_output, index=False)

print(f"I dati sono stati estratti e salvati in {nuovo_file_output}")

To the left is what my script does and to the right is what i want it to do, i put only one .ods file in the folder to make my problem easier to understand

Thanks to everyone in advance!

0

There are 0 best solutions below