How can I rotate a picture in AWS S3 before analyzing it using Textract?

639 Views Asked by At

PROBLEM

I have a picture in a AWS S3 bucket. I want to access it, then rotate it and then pass it to Textract in order to analyze it.

This is the code example from the official tutorial:

def process_text_analysis(bucket, document):
    #Get the document from S3
    s3_connection = boto3.resource('s3')
                          
    s3_object = s3_connection.Object(bucket,document)
    s3_response = s3_object.get()

    stream = io.BytesIO(s3_response['Body'].read())
    image=Image.open(stream)

    # Analyze the document
    client = boto3.client('textract')
    
    image_binary = stream.getvalue()
    response = client.analyze_document(Document={'Bytes': image_binary},
        FeatureTypes=["TABLES", "FORMS"])
    
    #Get the text blocks
    blocks=response['Blocks']
    width, height =image.size  
    draw = ImageDraw.Draw(image)  
   
    points=[]
    for polygon in block['Geometry']['Polygon']:
        points.append((width * polygon['X'], height * polygon['Y']))
        draw.polygon((points), outline='blue')
            
    # Display the image
    image.show()

QUESTIONS

  1. At which step should I add the rotation of the picture ? If possible without having to downloading + rotating locally + uploading back my picture to S3.
  2. How can I rotate not only the picture but also the image binary (image_binary) that is sent to Textract (through analyze_document() function)

ATTEMPTS

I tried to add the rotation here:

def process_text_analysis(bucket, document):
    #Get the document from S3
    s3_connection = boto3.resource('s3')
                          
    s3_object = s3_connection.Object(bucket,document)
    s3_response = s3_object.get()

    stream = io.BytesIO(s3_response['Body'].read())
    image=Image.open(stream)

    image = image.rotate(269, expand=1) <----------------------------- ROTATION HERE

    # Analyze the document
    client = boto3.client('textract')
    
    image_binary = stream.getvalue()
    response = client.analyze_document(Document={'Bytes': image_binary},
        FeatureTypes=["TABLES", "FORMS"])
    
    #Get the text blocks
    blocks=response['Blocks']
    width, height =image.size  
    draw = ImageDraw.Draw(image)  
   
    points=[]
    for polygon in block['Geometry']['Polygon']:
        points.append((width * polygon['X'], height * polygon['Y']))
        draw.polygon((points), outline='blue')
            
    # Display the image
    image.show()

But it will only rotate the picture not modify the image binary that is sent to Textract.

0

There are 0 best solutions below