How to convert XLSX file to CSV using AWS Glue Databrew

2.5k Views Asked by At

Is it possible to upload an excel file to an S3 bucket (input location = XLSX file), create a Databrew Dataset from that excel file, and create a recipe in AWS Glue Databrew that converts that excel file to a CSV file (output location which contains a transformed CSV file)?

1

There are 1 best solutions below

0
On

The short answer to your question is Yes, but not the way you are expecting it to be done. But here's one way --

  1. Create a dataset based on the XLSX file or files (as you have mentioned in your question)
  2. Open a project using this dataset (Link to docs in case you need details)
  3. Apply transformations on the dataset per your needs. If you do not have any transformations to apply, just rename the column for now. You need to have at least one transformation for the next step.
  4. Click on "Create Job" on the right top corner of the project page.
  5. Enter all the details as necessary (link to official docs). For the "Job output settings", select the following as show in the screenshot
    1. File type as "CSV" (default selected)
    2. Delimiter as "Comma (,)" (default selected)
  6. Click "Create and run job".

enter image description here

Please note -

It is worth mentioning that if you have a single file or just a couple files that needs to be converted into CSV, then there are easier ways to convert the file from XLSX to CSV using MS Excel software -- simply Open the file and "Save as" as CSV as shown in the screenshot below.

enter image description here