Converting CSV to Parquet File format using Script in SAP BODS

554 Views Asked by At

BODS job is creating CSV Files.

Is there a way to convert CSV Files to Parquet and Upload to S3 Bucket in SAP BODS.

The Current approach I am using is below for Converting the CSV to Parquet.

  1. Create a CSV File in the Folder that BODS is accessable.
  2. Created a Python Script and placed the Script in the Package Folder and below is the code:
import os,sys
os.chdir("/usr/sap/DBO/dataservices/DataQuality/python/lib/python3.7/site-packages")
sys.path.append('/usr/sap/DBO/dataservices/DataQuality/python/lib/python3.7/site-packages')
import pandas as pd
df = pd.read_csv('/ds_ext_share/BODS_DEV/Output/xxxx.csv')
df.to_parquet('/ds_ext_share/BODS_DEV/Output/xxxx.parquet')
  1. I am calling the above script as Exec in my BODS Job in a script:
exec('/usr/sap/DBO/dataservices/DataQuality/python/lib/python3.7/site-packages','XXXX.py' , '8');

The above code is not working because the CSV file is not converting to Parquet. How can I make it convert?

2

There are 2 best solutions below

1
On

Removing BODS from the equation the question remains how to convert a CSV file to parquet in python Pandas and PyArrow! This has been asked and answered in a similar thread here.

0
On

I assume your python code works as expected, it can read CSV file and write to parquet file.

The problem is the exec() function call in your BODS script.

exec('/usr/sap/DBO/dataservices/DataQuality/python/lib/python3.7/site-packages','XXXX.py' , '8');

exec() is to call shell command or script, first parameter is shell command or shell script, second parameter will be passed to shell command or shell script; the last parameter is how exec() ran, either return error, if wait for shell command or shell complete.

Please refer to SAP HELP.

The BODS script should be like below:

exec('/path/to/python/python', '/usr/sap/DBO/dataservices/DataQuality/python/lib/python3.7/site-packages/XXXX.py', 8);

Or create shell script bods.sh:

python /usr/sap/DBO/dataservices/DataQuality/python/lib/python3.7/site-packages/XXXX.py

Then BODS script:

exec('bods.sh',8);