KeyError: 'What is Included'

35 Views Asked by At

"C:\Users\manoj\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexes\base.py", line 3797, in get_loc raise KeyError(key) from err

I have data is in excel below is the link to access the excel file:

https://docs.google.com/spreadsheets/d/1x0Rumw0Kf4O2uEz1E6dwVdM7tTdzuEDEpZQhGH2kOBM/edit#gid=125177684.

There are two sheets available in above workbook. I am trying to perform a task in "Missing Data Removed" sheet so that I get the output as "Orignal Sheet".

I am trying to get the output file:

For which I am referring to column T "what is Included" upon checking the code I found that each product has multiple sub products so I need to append all those products below each row. will all the required information highlighted in green color.

import pandas as pd
from bs4 import BeautifulSoup

def extract_href_count(html):
    soup = BeautifulSoup(html, 'html.parser')
    return len(soup.find_all('a', href=True))

def process_data(input_file, output_file):
    # Read the Excel file into a DataFrame
    df = pd.read_excel(input_file)

    # Define a function to process each row
    def process_row(row):
        href_count = extract_href_count(row['What is Included'])
        
        # If more than one href is found, append new rows
        if href_count > 1:
            for _ in range(href_count - 1):
                df.loc[len(df.index)] = row

    # Apply the processing function to each row
    df.apply(process_row, axis=1)

    # Save the updated DataFrame to a new Excel file
    df.to_excel(output_file, index=False)

process_data('F:/Sample Data.xlsx', 'F:/Output.xlsx')
0

There are 0 best solutions below