I have a node.js function for AWS Lambda. It reads a JSON file from an S3 bucket as a stream, parses it and prints the parsed objects to the console. I am using stream-json module for parsing.
It works on my local environment and prints the objects to console. But it does not print the objects to the log streams(CloudWatch) on Lambda. It simply times out after the max duration. It prints other log statements around, but not the object values.
1. Using node.js 6.10 in both environments.
2. callback to the Lambda function is invoked only after the stream ends.
3. Lambda has full access to S3
4. Also tried Promise to wait until streams complete. But no change.
What am I missing? Thank you in advance.
const AWS = require('aws-sdk');
const {parser} = require('stream-json');
const {streamArray} = require('stream-json/streamers/StreamArray');
const {chain} = require('stream-chain');
const S3 = new AWS.S3({ apiVersion: '2006-03-01' });
/** ******************** Lambda Handler *************************** */
exports.handler = (event, context, callback) => {
// Get the object from the event and show its content type
const bucket = event.Records[0].s3.bucket.name;
const key = event.Records[0].s3.object.key;
const params = {
Bucket: bucket,
Key: key
};
console.log("Source: " + bucket +"//" + key);
let s3ReaderStream = S3.getObject(params).createReadStream();
console.log("Setting up pipes");
const pipeline = chain([
s3ReaderStream,
parser(),
streamArray(),
data => {
console.log(data.value);
}
]);
pipeline.on('data', (data) => console.log(data));
pipeline.on('end', () => callback(null, "Stream ended"));
};
I have figured out that it is because my Lambda function is running inside a private VPC.
(I have to run it inside a private VPC because it needs to access my ElastiCache instance. I removed related code when I posted the code, for simplification).
Code can access S3 from my local machine, but not from the private VPC.
There is a process to ensure that S3 is accessible from within your VPC. It is posted here https://aws.amazon.com/premiumsupport/knowledge-center/connect-s3-vpc-endpoint/
Here is another link that explains how you should setup a VPC end point to be able to access AWS resources from within a VPC https://aws.amazon.com/blogs/aws/new-vpc-endpoint-for-amazon-s3/