"C:\Users\manoj\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexes\base.py", line 3797, in get_loc raise KeyError(key) from err
I have data is in excel below is the link to access the excel file:
There are two sheets available in above workbook. I am trying to perform a task in "Missing Data Removed" sheet so that I get the output as "Orignal Sheet".
I am trying to get the output file:
For which I am referring to column T "what is Included" upon checking the code I found that each product has multiple sub products so I need to append all those products below each row. will all the required information highlighted in green color.
import pandas as pd
from bs4 import BeautifulSoup
def extract_href_count(html):
soup = BeautifulSoup(html, 'html.parser')
return len(soup.find_all('a', href=True))
def process_data(input_file, output_file):
# Read the Excel file into a DataFrame
df = pd.read_excel(input_file)
# Define a function to process each row
def process_row(row):
href_count = extract_href_count(row['What is Included'])
# If more than one href is found, append new rows
if href_count > 1:
for _ in range(href_count - 1):
df.loc[len(df.index)] = row
# Apply the processing function to each row
df.apply(process_row, axis=1)
# Save the updated DataFrame to a new Excel file
df.to_excel(output_file, index=False)
process_data('F:/Sample Data.xlsx', 'F:/Output.xlsx')