BODS job is creating CSV Files.
Is there a way to convert CSV Files to Parquet and Upload to S3 Bucket in SAP BODS.
The Current approach I am using is below for Converting the CSV to Parquet.
- Create a CSV File in the Folder that BODS is accessable.
- Created a Python Script and placed the Script in the Package Folder and below is the code:
import os,sys
os.chdir("/usr/sap/DBO/dataservices/DataQuality/python/lib/python3.7/site-packages")
sys.path.append('/usr/sap/DBO/dataservices/DataQuality/python/lib/python3.7/site-packages')
import pandas as pd
df = pd.read_csv('/ds_ext_share/BODS_DEV/Output/xxxx.csv')
df.to_parquet('/ds_ext_share/BODS_DEV/Output/xxxx.parquet')
- I am calling the above script as Exec in my BODS Job in a script:
exec('/usr/sap/DBO/dataservices/DataQuality/python/lib/python3.7/site-packages','XXXX.py' , '8');
The above code is not working because the CSV file is not converting to Parquet. How can I make it convert?
Removing BODS from the equation the question remains how to convert a CSV file to parquet in python Pandas and PyArrow! This has been asked and answered in a similar thread here.