how to build and test a AWS glue ETL spark code in local VS Code?

822 Views Asked by At

I am new to AWS Glue and I have been assigned to create a AWS Glue ETL job . We have only AWS Prod Environment in our project. I want to know how to setup my VS Code IDE so that I can build and test my glue code ?

I have seen a solution with docker image file but I want a solution without docker. Thanks.

2

There are 2 best solutions below

0
On

In my company we created an abstraction using AWS Wrangler library to run the queries remotely and save the result on a local cache ( CSV ) then we convert the pandas DF returned by AWS Wrangler back to PySpark and debug It 100% locally, works like a charm, no docker, super light weight and effective!

0
On

I think this great AWS blog post lists your options of locally developping AWS Glue: https://aws.amazon.com/blogs/big-data/building-an-aws-glue-etl-pipeline-locally-without-an-aws-account/

Using the docker image provided by AWS gives you a better integration and access to S3 data / Glue data catalogues... But as you have written, you want a solution without docker: Then the remaining option is using AWS Glue ETL library.