generatedata.py
import pandas as pd
import numpy as np
#Generate sample data
data = {
'Name': ['Alice','Bob','Charlie','David','Emma'],
'Age': np.random.randint(20,40, size=5),
'Salary': np.random.randint(30000,80000, size=5)
}
#Create a dataframe
df = pd.DataFrame(data)
#Perform some data manipulation
df['Bonus'] = df['Salary'] * 0.1
#Display the Dataframe
print("Original dataframe:")
print(df)
# Store the data in an excel file
excel_filename = 'sample_data.xlsx'
df.to_excel(excel_filename, index=False)
print(f"\n Data saved to '{excel_filename}' successfully")
buildspec.yml
version: 0.2
phases:
install:
runtime-versions:
python: 3.11
pre_build:
commands:
- echo "Installing dependencies"
- pip install pandas numpy
build:
commands:
- echo "Running the script"
- data-analysis/generatedata.py
- aws s3 cp sample_data.xlsx s3://upload-data-files-to-s3/
I am trying to build the file and upload the excel into s3 bucket but the build is failing and throwing error that unable to find the file- generate data. Where as the given repository name and source is correct.
unable to understand what to correct here