JS : Getting datatype from CSV using Amazon S3 Select

951 Views Asked by At

I am trying to read a CSV from amazon S3 bucket (this could be any CSV so I do not have the header/datatype info ahead of read.

I am able to get the header info using :

const params = {
    Bucket: 'mybucket',
    Key: file,
    ExpressionType: 'SQL',
    Expression: 'SELECT * FROM S3Object s LIMIT 1',
    InputSerialization: {
      CSV: {
        FileHeaderInfo: 'NONE',
        RecordDelimiter: '\n',
        FieldDelimiter: ','
      }
    },
    OutputSerialization: {
      CSV: {}
    }
  };

This gives me the header info.

  1. Is there a way to get the nearest datatype (or is it always as a string). Is there any other way to get the datatype of csv in javascript. (If I know the columns I can cast but this is dynamic)

  2. S3 Select docs says it is possible to do aggregates but on trying the same it always errors with "The column index at line 1, column 12 is invalid. Please check the service documentation and try again."

select MAX(cast(totalamount as float)) from s3object s

Error


The column index at line 1, column 12 is invalid. Please check the service documentation and try again.

Thanks

1

There are 1 best solutions below

0
On
  1. S3 select documentation states that "If CAST is not defined, then all input data is treated as a string. It must be cast into the relevant data types when necessary." So S3 select will treat everything as string.
  2. In order to do aggregates on the csv file you need to exclude the header row. Try setting FileHeaderInfo to Use to specify First line is a header, and you can use the header value to identify a column in an expression. The query should then give you expected results. https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html