Unable to access csv file generated by a jar file in AWS Glue

573 Views Asked by At

This is my first question here!

So we're working on some MDM related stuff wherein we need to run a jar file provided by our MDM partner to merge the records. We are able to call the subprocess() method in our AWS Glue script to run the jar file. All good so far. We are required to write the location and file name in a property file but unfortunately it does not accept any S3 bucket links other than Windows/Linux style file links.

We did try this:

MERGE_OUTPUT_FILE_LOCATION:./filename

by which we are trying to point to the Temporary Directory in the Glue job (TempDir/filename is not accepted) and the above is the only way the jar file begins to execute. We then tried to reference this file name from the TempDir to create a DynamicFrame which failed since no such file actually existed.

create_jar_frame = glueContext.create_dynamic_frame.from_options(connection_type="s3",connection_options = {"paths": ["TempDir/filename.csv"], "recurse": True},format="csv")

Any idea where a file gets saved in AWS Glue when the given location is just

./filename

Any idea how we can reference the files to be generated in the temp directory and pull it into a Dynamic Frame? Or should we create an EC2 instance/EMR and do it the long and hard way?

1

There are 1 best solutions below

1
On BEST ANSWER

So turns out AWS Glue temporary directory works just like AWS Lambda. All the temporary files can be written and read from this directory.

/tmp/

tmp is the location of the temporary folder which can be specified in jar property file as the location where the file can be read/written.

Since AWS Glue works on a Unix like platform, the read and write commands should be of Unix type.