Getting OverMaxRecordSize when querying through S3 select

342 Views Asked by At

Getting the following error The character number in one record is more than our max threshold, maxCharsPerRecord: 1,048,576 while running any query and trying to fetch any record.

I've tried changing from JSON schema to CSV but that hasn't worked. After a bit of research I found out it was a limitation of the AWS solution. Is there a way to find out which record that is larger than 1 MB with the downloaded data from properties.ldjson.gz?

1

There are 1 best solutions below

0
On

You can download the file, then use some Unix commands/python code:

Unix:

cat file.txt | awk 'length > 1048576 {print NR ": " length}' 

Python:

with open('file.txt', 'r') as f:
    for i, line in enumerate(f):
        if len(line) > 1048576:
            print("Line", i+1, "has more than 1 million characters.")