Running dataflow locally on Cloud Run

462 Views Asked by At

I am trying to automate functional tests for a streaming dataflow. However, the streaming dataflow is listening to a pub sub topic which lies outside of our project and where we do not have access to publish messages for performing tests. When I manually test my dataflow, I just run it locally where I connect to our internal pubsub topic where we have control to publish the messages. So I wanted to automate this same behaviour for our functional tests. Below is my idea around this,

  1. Run dataflow locally (listening to internal topic) on container in Cloud Run
  2. Start functional test by posting a message to internal topic
  3. Wait for few seconds and check if the message is processed to BigQuery

For step1, I wrote the below docker file (some commands are excluded for simplicity),

# Use the official lightweight Python image.
# https://hub.docker.com/_/python
FROM python:3.8-slim

# Allow statements and log messages to immediately appear in the Knative logs
ENV PYTHONUNBUFFERED True

#Install git
RUN apt-get update \
    && apt-get install -y git

# Make ssh dir
RUN mkdir /root/.ssh/

# Copy over private key, and set permissions

#Change directory and clone git repo

#Set working directory and authenticate service account

# Install dependencies
RUN pip install -r requirements.txt

# Run dataflow locally
CMD ["python", "main.py", "deploy", "local"]

Then I built the image which created the image in the Container Registry. Then I went to Cloud run console and tried to create a service with this image. For which I got the below error,

Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable. Logs for this revision might contain more information. Logs URL: https://console.cloud.google.com/logs/viewer?project=xxxxxxxx&resource=cloud_run_revision/service_name/consume-sales-order-functional-test/revision_name/consume-sales-order-functional-test-00002-vib&advancedFilter=resource.type%3D%22cloud_run_revision%22%0Aresource.labels.service_name%3D%22consume-sales-order-functional-test%22%0Aresource.labels.revision_name%3D%22consume-sales-order-functional-test-00002-vib%22

I understand that I am getting this error probably because Cloud run is expecting me to mention a PORT where the service can listen on. But my dataflow is not really a web application where I don't start a webserver.

Any ideas on how to work around this?

1

There are 1 best solutions below

1
On BEST ANSWER

If you don't have a web application listening on HTTP on the given $PORT number, that application is not suitable for Cloud Run.