Spark DataFrame to Google Cloud PubSub

1.1k Views Asked by At

I want to stream/ batch load data from a Spark DataFrame to the PubSub. I came across with some libraries like:

  1. Apache Bahir: Useful for Streaming data from PubSub only. https://bahir.apache.org/docs/spark/2.2.1/spark-streaming-pubsub/
  2. PubSub Lite Connector: Capable to writing to PubSub Lite, not sure if this works for PubSub.
1

There are 1 best solutions below

1
On

You cannot use the Pub/Sub Lite connector for writing messages to Pub/Sub. Though Pub/Sub & Pub/Sub Lite both are horizontally scalable and managed messaging services but due to some differences these are two individual products.

You can refer to this documentation to check the differences between Pub/Sub and Pub/Sub Lite. From the doc:

Pub/Sub is usually the default solution for most application integration and analytics use cases.
Pub/Sub Lite is only recommended for applications where achieving extremely low cost justifies some additional operational work.

For stream/ batch load data from a Spark DataFrame to the Pub/Sub you can use Apache Bahir’s Pub/Sub connector.
You can find this example from Google Cloud Platform where Apache Bahir’s Spark Streaming connector for Google Cloud Pub/Sub has been used.