is it possible to split data while moving it from one excel file to another in python? Pandas or openpyxl

75 Views Asked by Curandera At 19 January 2024 at 23:20

Essentially I need to translate one excel document into another.

Both sheets are formatted differently, but contain most of the same information - however in sheet 1, some data is formatted in a different way.

For Example "Name" in Sheet 1, and "First Name" "Last Name" in Sheet 2

is it possible to have my script do this for me? What about looking for seperators like a comma to split "Address" into "Street" "city" "state" "zip" -- or is that best done post-translate using Excel tools.

I've been able to read rows directly using openpyxl with code like this:

    step = 2
    read_start_row = 4
    write_start_row = 3
    amount_of_rows = 30

    for i in range(0, amount_of_rows, step):
        #copy from wb1
        c = ws1.cell(row=read_start_row+i, column=4)
        #paste in ws2
        ws2.cell(row=write_start_row+(i/step), column=4, value=c.value)

but not sure where to start when trying to also alter the data.

Original Q&A

There are 1 best solutions below

moken On 20 January 2024 at 02:15 BEST ANSWER

This is a quick example if just copying the data.

The Example sheet has two columns; 'Name' and 'Address' where;

Name includes "First" and "Last" name separated by space
Address includes "Street" "City" "State" and "Zip" separated by comma

The example code reads the source Excel sheet splits the two columns into the constituents and writes the result back to the destinaton sheet

import pandas as pd


### Read data from source sheet
df = pd.read_excel('source.xlsx', sheet_name='Sheet1')

### Split the necessary columns on the delimiters
df[['First', 'Last']] = df['Name'].str.split(' ', n=1, expand=True)  # Delimiter is space
df[['Street', 'City', 'State', 'Zip']] = df['Address'].str.split(', ', n=3, expand=True)  # Delimiter is comma

### Drop the now unnecessary columns
df = df.drop(['Name', 'Address'], axis=1)

### Reorder columns, probably not needed in this case but ensures the columns are in correct order.
df = df[['First', 'Last', 'Street', 'City', 'State', 'Zip']] 

### Write to the destination sheet, start row 2 (startrow 1), drop index and headers
with pd.ExcelWriter('dest.xlsx', mode='a', engine='openpyxl', if_sheet_exists='overlay') as writer:
    df.to_excel(writer, sheet_name="Sheet1", startrow=1, index=False, header=False)

Resultant dataframe

   First   Last          Street     City State         Zip
0  Mavis   West  421 E DRACHMAN   TUCSON    AZ  85705-7598
1   John  Spurs     100 MAIN ST  SEATTLE    WA       98104
2   Jack   East   105 KROME AVE    MIAMI    FL  33185 3700

In this example the destination Sheet contains the headers already on row 1 (note this is row 0 for to_excel). For this since writing to an existing sheet which already contains data (Headers) we use 'mode=a' (append) which requires Openpyxl as the engine.
The to_excel write excludes the Headers from the dataframe being they already exist, however if preferred the destination sheet could be empty and the headers written along with the columns.

is it possible to split data while moving it from one excel file to another in python? Pandas or openpyxl

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in EXCEL

Related Questions in OPENPYXL

Related Questions in DATA-CONVERSION

Trending Questions

Popular # Hahtags

Popular Questions