How do I use AWS Lambda to trigger Comprehend with S3?

643 Views Asked by At

I'm currently using aws lambda to trigger an amazon comprehend job, but the code is only used to run one piece of text under sentiment analysis.

import boto3
def lambda_handler(event, context):
    s3 = boto3.client("s3")
    bucket = "bucketName"
    key = "textName.txt"
    file = s3.get_object(Bucket = bucket, Key = key)
    
    analysisdata = str(file['Body'].read())

    comprehend = boto3.client("comprehend")

    sentiment = comprehend.detect_sentiment(Text = analysisdata, LanguageCode = "en")
    print(sentiment)
    
    return 'Sentiment detected'

I want to run a file where each line in the text file is a new piece of text to analyze with sentiment analysis (it's an option if you manually enter stuff into comprehend), but is there a way to alter this code to do that? And have the output sentiment analysis file be placed into that same S3 bucket? Thank you in advance.

1

There are 1 best solutions below

5
John Rotenstein On

It looks like you can use start_sentiment_detection_job():

response = client.start_sentiment_detection_job(
    InputDataConfig={
        'S3Uri': 'string',
        'InputFormat': 'ONE_DOC_PER_FILE'|'ONE_DOC_PER_LINE',
        'DocumentReaderConfig': {
            'DocumentReadAction': 'TEXTRACT_DETECT_DOCUMENT_TEXT'|'TEXTRACT_ANALYZE_DOCUMENT',
            'DocumentReadMode': 'SERVICE_DEFAULT'|'FORCE_DOCUMENT_READ_ACTION',
            'FeatureTypes': [
                'TABLES'|'FORMS',
            ]
        }
    },
    OutputDataConfig={
        'S3Uri': 'string',
        'KmsKeyId': 'string'
    },
    ...
)

It can read from an object in Amazon S3 (S3Uri) and store the output in an S3 object.

It looks like you could use 'InputFormat': 'ONE_DOC_PER_LINE' to meet your requirements.