Python solution for iterating through .xls spreadsheet with multiple tabs, deleting rows and first column, and export to csv?

209 Views Asked by At

I am extremely frustrated trying to write a code to iterate through a .xls (or .xlsx) spreadsheet where each sheet is a year, editing each sheet to remove rows 1-4,6,and 72-150, remove first column, then export that sheet to a csv. So I should have a csv per sheet (year). I have tried using pandas, xlrd, openpyxl, but keep getting errors on every line. Can someone help me? What is the simplest way with the fewest extra packages? keep getting that list not callable or string or wrong index. Please help.

Appreciate it!

import openpyxl
import csv 
workbook = openpyxl.load_workbook("C:/Users/example.xlsx")
worksheet = workbook.sheetnames
print(worksheet)
for sheet in worksheet:
    mrts_worksheet['B5'] = "Diff Label"
    sheet.delete_rows(mrts_worksheet.delete_rows(72, 100))
    sheet.delete_rows(mrts_worksheet.delete_rows(6, 1))
    sheet.delete_rows(mrts_worksheet.delete_rows(1, 4))
    sheet.delete_cols(1,1)
    with open("C:/Users/example"+ sheet +".csv", "w", newline="") as file_handle:
        csv_writer = csv.writer(file_handle)
        for row in sheet.iter_rows(): 
            csv_writer.writerow([cell.value for cell in row])
    file_handle.close()
2

There are 2 best solutions below

0
moken On

For your code sample to work given the details provided in the description it would need to change to something like this;
The deletion method is good, starting from the last rows to delete and moving up to row 1. Your code has delete_rows(72, 100) this will delete rows 72 to 171 which does not match the requirement, 72-150 so this is changed to delete_rows(72, 79).
Note: deleting row 1 will remove the headers.

The 'mrts_worksheet' parts have been removed.
I do not know what the purpose of this sheet is or how it's supposed to work with deleting rows in the other sheets. Even the writing of a value to the cell B5 being in the sheet loop would seem wasteful, mrts_worksheet['B5'] = "Diff Label" given you'd only need to do this one time only and not once for every sheet in the workbook as it doesn't change.

import openpyxl
import csv


workbook = openpyxl.load_workbook("data.xlsx")
worksheet = workbook.worksheets  # List of Worksheet objects

print(worksheet)
for sheet in worksheet:  # 'sheet' is a worksheet
    sheet.delete_rows(72, 79)  # Delete rows 72 - 150,  start at row 72 and delete the next 79 rows
    sheet.delete_rows(6, 1)  # Delete row 6, start at row 6 and delete that row only
    sheet.delete_rows(1, 4)  # Delete rows 1 - 4, start at row 1 and delete the next 4 rows
    sheet.delete_cols(1, 1)  # Delete column 'A'
    ### Create csv file name using sheet name (title)
    with open("example" + sheet.title + ".csv", "w", newline="") as file_handle:
        csv_writer = csv.writer(file_handle)
        for row in sheet.iter_rows():
            csv_writer.writerow([cell.value for cell in row])
    file_handle.close()

0
Dheeraj Malik On

You can use Spire.XLS for Python package to manipulate xls or xlsx files and convert them to csv.

Here is the command to install the package:

pip install Spire.Xls

Here is an example that removes specific rows and columns from each sheet of a .xls or .xlsx file and then save each sheet as a separate csv file:

from spire.xls import *

workbook = Workbook()
# Load a .xls file
# workbook.LoadFromFile("Input.xls")
# Load a .xlsx file
workbook.LoadFromFile("Input.xlsx")
 
for sheet in workbook.Worksheets:
    # Remove rows 72-150
    sheet.DeleteRow(72, 79)
    # Remove row 6
    sheet.DeleteRow(6, 1)
    # Remove rows 1-4
    sheet.DeleteRow(1, 4)
    # Remove column 1
    sheet.DeleteColumn(1, 1)
    # Save each sheet to a csv
    csv_name = sheet.Name + ".csv"
    sheet.SaveToFile(csv_name, ",", Encoding.get_UTF8())

workbook.Dispose()

Disclaimer: I work for the company that developed this package.