Decode binary file in AWS environment using PySpark

30 Views Asked by At

Is it possible to consume a Netezza backup file in AWS environment and load it to Redshift. File is a compressed binary file created using the below query. This file can also be produced using NZ_BACKUP utility in Netezza for a full database.

CREATE EXTERNAL TABLE 'C:\filename.bak' USING (remotesource 'ODBC' FORMAT 'internal' 
COMPRESS true) AS SELECT * FROM schema.tablename;

or 

nzbackup -dir /home/user/backups -u user -pw password -db db1

I want this file to be decoded and load to a data frame in AWS environment utilizing Python or PySpark (glue). Following are the steps I am planning to do in AWS. I need some guidance in the first step. How to decode a compressed binary from Netezza.

  1. Decode the file to ASCII (How?)
  2. Load to DF and generate parquet
  3. Copy to Redshift
0

There are 0 best solutions below